列车损失没有减少(将pytorch代码转换为tensorflow)



我正在与roberta一起使用变压器训练一个模型,经过几步后,列车损失并没有减少,我找不出原因,任何建议都将不胜感激。

这是型号:

class TriggerExtractor(keras.Model):
def __init__(self,
bert_dir,
dropout_prob=0.1,
use_distant_trigger=True,
**kwargs):
super(TriggerExtractor, self).__init__()
config_path = os.path.join(bert_dir, 'config.json')
assert os.path.exists(bert_dir) and os.path.exists(config_path), 'pretrained bert file does not exist'
self.bert_module = TFBertModel.from_pretrained(bert_dir)
self.bert_config = self.bert_module.config
self.use_distant_trigger = use_distant_trigger
out_dims = self.bert_config.hidden_size
if use_distant_trigger:
embedding_dim = kwargs.pop('embedding_dims', 256)
self.distant_trigger_embedding = tf.keras.layers.Embedding(input_dim=3, output_dim=embedding_dim, embeddings_initializer=keras.initializers.HeNormal)
out_dims += embedding_dim
mid_linear_dims = kwargs.pop('mid_linear_dims', 128)
self.mid_linear = keras.Sequential([
keras.layers.Dense(mid_linear_dims, input_shape=(out_dims,), activation=None),
keras.layers.ReLU(),
keras.layers.Dropout(dropout_prob)
])
self.classifier = keras.layers.Dense(2, input_shape=(mid_linear_dims, ), activation=None)
self.criterion = keras.losses.BinaryCrossentropy()
def call(self, inputs):
# print('inputs:', inputs)
token_ids = inputs['token_ids']
attention_masks = inputs['attention_masks']
token_type_ids = inputs['token_type_ids']
distant_trigger = inputs['distant_trigger']
labels = inputs['labels']
bert_outputs = self.bert_module(
input_ids=token_ids,
attention_mask=attention_masks,
token_type_ids=token_type_ids
)
seq_out = bert_outputs[0]
if self.use_distant_trigger:
assert distant_trigger is not None, 
'When using distant trigger features, distant trigger should be implemented'
distant_trigger_feature = self.distant_trigger_embedding(distant_trigger)
seq_out = keras.layers.concatenate([seq_out, distant_trigger_feature], axis=-1)
seq_out = self.mid_linear(seq_out)
logits = keras.activations.sigmoid(self.classifier(seq_out))
out = (logits,)
if labels is not None:
labels = tf.cast(labels, dtype=tf.float32)
loss = self.criterion(logits, labels)
out = (loss,) + out
return out

这是列车代码:

train_loader = tf.data.Dataset.from_tensor_slices(train_dataset.__dict__).shuffle(10000).batch(opt.train_batch_size)
for epoch in range(opt.train_epochs):
for step, batch_data in enumerate(train_loader):
with tf.GradientTape() as tape:
loss = model(batch_data)
grads = tape.gradient(loss, model.variables)
# for (grad, var) in zip(grads, model.variables):
#     if grad is not None:
#         name = var.name
#         space = name.split('/')
#         if space[0] == 'tf_bert_model':
#             optimizer_bert.apply_gradients([(tf.clip_by_norm(grad, opt.max_grad_norm), var)])
#         else:
#             optimizer_other.apply_gradients([(tf.clip_by_norm(grad, opt.max_grad_norm), var)])

optimizer.apply_gradients([
(tf.clip_by_norm(grad, opt.max_grad_norm), var)
for (grad, var) in zip(grads, model.variables)
if grad is not None
])
global_step += 1
if global_step % log_loss_steps == 0:
avg_loss /= log_loss_steps
logger.info('epoch:%d Step: %d / %d ----> total loss: %.5f' % (epoch, global_step, t_total, avg_loss))
avg_loss = 0.
else:
avg_loss += loss[0]

结果是:

07/13/2021 20:09:22 - INFO - src_final.utils.trainer -   ***** Running training *****
07/13/2021 20:09:22 - INFO - src_final.utils.trainer -     Num Epochs = 10
07/13/2021 20:09:22 - INFO - src_final.utils.trainer -     Total training batch size = 8
07/13/2021 20:09:22 - INFO - src_final.utils.trainer -     Total optimization steps = 3070
07/13/2021 20:09:22 - INFO - src_final.utils.trainer -   Save model in 307 steps; Eval model in 307 steps
07/13/2021 20:09:36 - INFO - src_final.utils.trainer -   epoch:0 Step: 20 / 3070 ----> total loss: 1.73774
07/13/2021 20:09:50 - INFO - src_final.utils.trainer -   epoch:0 Step: 40 / 3070 ----> total loss: 0.04631
07/13/2021 20:10:03 - INFO - src_final.utils.trainer -   epoch:0 Step: 60 / 3070 ----> total loss: 0.04586
07/13/2021 20:10:17 - INFO - src_final.utils.trainer -   epoch:0 Step: 80 / 3070 ----> total loss: 0.04734
07/13/2021 20:10:31 - INFO - src_final.utils.trainer -   epoch:0 Step: 100 / 3070 ----> total loss: 0.04554
07/13/2021 20:10:44 - INFO - src_final.utils.trainer -   epoch:0 Step: 120 / 3070 ----> total loss: 0.04733
07/13/2021 20:10:58 - INFO - src_final.utils.trainer -   epoch:0 Step: 140 / 3070 ----> total loss: 0.04613
07/13/2021 20:11:12 - INFO - src_final.utils.trainer -   epoch:0 Step: 160 / 3070 ----> total loss: 0.04643
07/13/2021 20:11:26 - INFO - src_final.utils.trainer -   epoch:0 Step: 180 / 3070 ----> total loss: 0.04613
07/13/2021 20:11:39 - INFO - src_final.utils.trainer -   epoch:0 Step: 200 / 3070 ----> total loss: 0.04643
07/13/2021 20:11:53 - INFO - src_final.utils.trainer -   epoch:0 Step: 220 / 3070 ----> total loss: 0.04553
07/13/2021 20:12:07 - INFO - src_final.utils.trainer -   epoch:0 Step: 240 / 3070 ----> total loss: 0.04582
07/13/2021 20:12:21 - INFO - src_final.utils.trainer -   epoch:0 Step: 260 / 3070 ----> total loss: 0.04642
07/13/2021 20:12:35 - INFO - src_final.utils.trainer -   epoch:0 Step: 280 / 3070 ----> total loss: 0.04582
07/13/2021 20:12:48 - INFO - src_final.utils.trainer -   epoch:0 Step: 300 / 3070 ----> total loss: 0.04672
07/13/2021 20:12:53 - INFO - src_final.utils.trainer -   Saving model & optimizer & scheduler checkpoint to ./out/final/trigger/roberta_wwm_distant_trigger_pgd/checkpoint-307
07/13/2021 20:13:05 - INFO - src_final.utils.trainer -   epoch:1 Step: 320 / 3070 ----> total loss: 0.04582
07/13/2021 20:13:18 - INFO - src_final.utils.trainer -   epoch:1 Step: 340 / 3070 ----> total loss: 0.04552
07/13/2021 20:13:32 - INFO - src_final.utils.trainer -   epoch:1 Step: 360 / 3070 ----> total loss: 0.04672
07/13/2021 20:13:46 - INFO - src_final.utils.trainer -   epoch:1 Step: 380 / 3070 ----> total loss: 0.04762
07/13/2021 20:13:59 - INFO - src_final.utils.trainer -   epoch:1 Step: 400 / 3070 ----> total loss: 0.04642
07/13/2021 20:14:13 - INFO - src_final.utils.trainer -   epoch:1 Step: 420 / 3070 ----> total loss: 0.04612
07/13/2021 20:14:27 - INFO - src_final.utils.trainer -   epoch:1 Step: 440 / 3070 ----> total loss: 0.04582
07/13/2021 20:14:41 - INFO - src_final.utils.trainer -   epoch:1 Step: 460 / 3070 ----> total loss: 0.04702
07/13/2021 20:14:54 - INFO - src_final.utils.trainer -   epoch:1 Step: 480 / 3070 ----> total loss: 0.04672
07/13/2021 20:15:08 - INFO - src_final.utils.trainer -   epoch:1 Step: 500 / 3070 ----> total loss: 0.04672
07/13/2021 20:15:22 - INFO - src_final.utils.trainer -   epoch:1 Step: 520 / 3070 ----> total loss: 0.04552
07/13/2021 20:15:36 - INFO - src_final.utils.trainer -   epoch:1 Step: 540 / 3070 ----> total loss: 0.04552
07/13/2021 20:15:49 - INFO - src_final.utils.trainer -   epoch:1 Step: 560 / 3070 ----> total loss: 0.04672
07/13/2021 20:16:03 - INFO - src_final.utils.trainer -   epoch:1 Step: 580 / 3070 ----> total loss: 0.04552

PS:出于某种原因,我正在将pytorch代码转换为tensorflow,pytorch版本可以(https://github.com/WuHuRestaurant/xf_event_extraction2020Top1)。再次感谢

我找到了原因。它是介于"y_pred"one_answers"y_true"之间的位置。pytorch和tensorflow 之间存在差异

最新更新