使用' keras.Model '时,尺寸不匹配.拟合tensorflow的BERT



我按照微调BERT的指示用我自己的数据集(它有点大,大于20G)构建一个模型,然后采取步骤重新生成我的数据并从tf_record文件加载它们。我创建的training_dataset与指令

中的签名相同。
training_dataset.element_spec
({'input_word_ids': TensorSpec(shape=(32, 1024), dtype=tf.int32, name=None), 
'input_mask': TensorSpec(shape=(32, 1024), dtype=tf.int32, name=None), 
'input_type_ids': TensorSpec(shape=(32, 1024), dtype=tf.int32, name=None)}, 
TensorSpec(shape=(32,), dtype=tf.int32, name=None))

,其中batch_size为32,max_seq_length为1024。如说明所示,

The resulting tf.data.Datasets return (features, labels) pairs, as expected by keras.Model.fit

看起来一切都像预期的那样工作,(指令没有显示如何使用training_dataset)然而,下面的代码

bert_classifier.fit(
x = training_dataset, 
validation_data=test_dataset, # has the same signature just as training_dataset
batch_size=32,
epochs=epochs,
verbose=1,
)

遇到一个错误,对我来说似乎很奇怪,

Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/captain/project/dataload/train.py", line 81, in <module>
verbose=1,
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
*args, **kwds))
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py:805 train_function  *
return step_function(self, iterator)
/home/captain/.local/lib/python3.7/site-packages/official/nlp/keras_nlp/layers/position_embedding.py:88 call  *
return tf.broadcast_to(position_embeddings, input_shape)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py:845 broadcast_to  **
"BroadcastTo", input=input, shape=shape, name=name)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper
attrs=attr_protos, op_def=op_def)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:592 _create_op_internal
compute_device)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3536 _create_op_internal
op_def=op_def)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2016 __init__
control_input_ops, op_def)
/home/captain/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1856 _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 512 and 1024 for '{{node bert_classifier/bert_encoder_1/position_embedding/BroadcastTo}} = 
BroadcastTo[T=DT_FLOAT, Tidx=DT_INT32](bert_classifier/bert_encoder_1/position_embedding/strided_slice_1, bert_classifier/bert_encoder_1/position_embedding/Shape)' 
with input shapes: [512,768], [3] and with input tensors computed as partial shapes: input[1] = [32,1024,768].

与512没有任何关系,我没有在我的代码中使用512。那么我的代码有什么问题以及如何解决这个问题?

他们基于从bert_config.json加载的bert_config_file创建了bert_classifier

bert_classifier, bert_encoder = bert.bert_models.classifier_model(bert_config, num_labels=2)

bert_config.json

{
'attention_probs_dropout_prob': 0.1,
'hidden_act': 'gelu',
'hidden_dropout_prob': 0.1,
'hidden_size': 768,
'initializer_range': 0.02,
'intermediate_size': 3072,
'max_position_embeddings': 512,
'num_attention_heads': 12,
'num_hidden_layers': 12,
'type_vocab_size': 2,
'vocab_size': 30522
}

根据此配置,hidden_size为768,max_position_embeddings为512,因此用于馈送BERT模型的输入数据必须与所描述的形状相同。它解释了为什么会出现形状不匹配的问题。

因此,为了使其工作,您必须将所有用于创建张量输入的行从1024更改为512

相关内容

  • 没有找到相关文章

最新更新