计算自定义训练循环中的梯度,TF与Torch的性能差异



我尝试将pytorch实现的一个计算分子结构中的力和能量的神经网络模型转换为TensorFlow。这需要一个自定义训练循环和自定义损失函数,所以我在下面实现了不同的一步训练函数。

  1. 第一次使用嵌套梯度磁带
def calc_gradients(D_train_batch, E_train_batch, F_train_batch, opt):

#set up gradient tape scope in order to track gradients of both d(Loss)/d(Weights)
#and d(output)/d(input)
with tf.GradientTape() as tape1:
with tf.GradientTape() as tape2:
#set gradient tape to watch Tensor
tape2.watch(D_train_batch)
#pass D thru model to get predicted energy vals
E_pred = model(D_train_batch, training=True)

df_dD_train_batch = tape2.gradient(E_pred, D_train_batch) 
#matrix mult of -Grad_D(f) x Grad_r(D)
F_pred = -tf.einsum('ijkl,il->ijk', dD_dr_train_batch, df_dD_train_batch)
#calculate loss value
loss = force_energy_loss(E_pred, F_pred, E_train_batch, F_train_batch)



grads = tape1.gradient(loss, model.trainable_weights)
opt.apply_gradients(zip(grads, model.trainable_weights))
  1. 再次尝试使用渐变磁带(persistent = true)
def calc_gradients_persistent(D_train_batch, E_train_batch, F_train_batch, opt):
#set up gradient tape scope in order to track gradients of both d(Loss)/d(Weights)
#and d(output)/d(input)
with tf.GradientTape(persistent = True) as outer:

#set gradient tape to watch Tensor
outer.watch(D_train_batch)

#output values from model, set trainable to be true to get 
#model.trainable_weights out
E_pred = model(D_train_batch, training=True)

#set gradient tape to watch trainable weights
outer.watch(model.trainable_weights)

#get gradient of output (f/E_pred) w.r.t input (D/D_train_batch) and cast to double
df_dD_train_batch = outer.gradient(E_pred, D_train_batch)

#matrix mult of -Grad_D(f) x Grad_r(D)
F_pred = -tf.einsum('ijkl,il->ijk', dD_dr_train_batch, df_dD_train_batch)
#calculate loss value
loss = force_energy_loss(E_pred, F_pred, E_train_batch, F_train_batch)

#get gradient of loss w.r.t to trainable weights for back propogation
grads = outer.gradient(loss, model.trainable_weights)

#updates weights using the optimizer and the gradients (grads)
opt.apply_gradients(zip(grads, model.trainable_weights)) 

这些是pytorch代码的尝试翻译

# Forward pass: Predict energies from the descriptor input
E_train_pred_batch = model(D_train_batch)
# Get derivatives of model output with respect to input variables. The
# torch.autograd.grad-function can be used for this, as it returns the
# gradients of the input with respect to outputs. It is very important
# to set the create_graph=True in this case. Without it the derivatives
# of the NN parameters with respect to the loss from the force error
# will not be populated (=the force error will not affect the
# training), but the model will still run fine without errors.
df_dD_train_batch = torch.autograd.grad(
outputs=E_train_pred_batch,
inputs=D_train_batch,
grad_outputs=torch.ones_like(E_train_pred_batch),
create_graph=True,
)[0]
# Get derivatives of input variables (=descriptor) with respect to atom
# positions = forces
F_train_pred_batch = -torch.einsum('ijkl,il->ijk', dD_dr_train_batch, df_dD_train_batch)
# Zero gradients, perform a backward pass, and update the weights.
# D_train_batch.grad.data.zero_()
optimizer.zero_grad()
loss = energy_force_loss(E_train_pred_batch, E_train_batch, F_train_pred_batch, F_train_batch)
loss.backward()
optimizer.step()

来自于description库的教程https://singroup.github.io/dscribe/latest/tutorials/machine_learning/forces_and_energies.html

使用TF实现的任何一个版本,与运行pytorch版本相比,在预测精度上都有巨大的损失。我想知道,我可能误解了pytorch代码和翻译不正确,如果是这样,我的差异在哪里?

p。S模型直接计算能量E,从中我们使用E的梯度w.r.t.d来计算力f。损失函数是力和能量的均方误差的加权和。

这些方法实际上是相同的,我的错误是在其他地方创建不同的结果。对于任何试图实现TensorFlow版本的人来说,嵌套的梯度磁带大约快了2倍,至少在这种情况下,并且还确保将函数包装在@tf.function中,以便使用图而不是急于执行,速度大约是10倍。

最新更新