在Keras将变压器输出连接到CNN输入时出现问题



我需要在Tensorflow中按照编码器-解码器的方法构建一个基于转换器的架构,其中编码器是预先存在的Huggingface Distilbert模型,解码器是CNN。

输入:一个文本,包含一行中有几个短语的文本。输出:根据分类标准编码。我的数据文件有7387对TSV格式的文本标签:

text t code
This is example text number one. It might contain some other phrases. t C21
This is example text number two. It might contain some other phrases. t J45.1
This is example text number three. It might contain some other phrases. t A27

代码的其余部分是:

text_file = "data/datafile.tsv"
with open(text_file) as f:
lines = f.read().split("n")[:-1]
text_and_code_pairs = []
for line in lines:
text, code = line.split("t")
text_and_code_pairs.append((text, code))

random.shuffle(text_and_code_pairs)
num_val_samples = int(0.10 * len(text_and_code_pairs))
num_train_samples = len(text_and_code_pairs) - 3 * num_val_samples
train_pairs = text_and_code_pairs[:num_train_samples]
val_pairs = text_and_code_pairs[num_train_samples : num_train_samples + num_val_samples]
test_pairs = text_and_code_pairs[num_train_samples + num_val_samples :]
train_texts = [fst for (fst,snd) in train_pairs]
train_labels = [snd for (fst,snd) in train_pairs]
val_texts = [fst for (fst,snd) in val_pairs]
val_labels = [snd for (fst,snd) in val_pairs]
test_texts = [fst for (fst,snd) in test_pairs]
test_labels = [snd for (fst,snd) in test_pairs]
distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")
tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-multilingual-cased")
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
dict(test_encodings),
test_labels
))
model = build_model(distilbert_encoder)
model.fit(train_dataset.batch(64), validation_data=val_dataset, epochs=3, batch_size=64)
model.predict(test_dataset, verbose=1)

最后,build_model函数:

def build_model(transformer, max_len=512):
model = tf.keras.models.Sequential()
# Encoder
inputs = layers.Input(shape=(max_len,), dtype=tf.int32)
distilbert = transformer(inputs)
# LAYER - something missing here?
# Decoder
conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
flat = tf.keras.layers.Flatten()(pooling)
fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
print(model.summary())
return model

我设法缩小了问题的可能位置。在从顺序改变为功能Keras API之后,我得到了以下错误:

Traceback (most recent call last):
File "keras_transformer.py", line 99, in <module>
main()
File "keras_transformer.py", line 94, in main
model = build_model(distilbert_encoder)
File "keras_transformer.py", line 23, in build_model
conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 897, in __call__
self._maybe_build(inputs)
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2416, in _maybe_build
self.build(input_shapes)  # pylint:disable=not-callable
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 152, in build
input_shape = tensor_shape.TensorShape(input_shape)
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in __init__
self._dims = [as_dimension(d) for d in dims_iter]
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in <listcomp>
self._dims = [as_dimension(d) for d in dims_iter]
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 716, in as_dimension
return Dimension(value)
File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 200, in __init__
None)
File "<string>", line 3, in raise_from
TypeError: Dimension value must be integer or None or have an __index__ method, got 'last_hidden_state'

误差似乎在于变压器的输出和卷积层的输入之间的连接。我是否应该在它们之间包括另一层,以适应变压器的输出?如果是这样,最好的选择是什么?我使用tensorflow==2.2.0,transformers==4.5.1和Python 3.6.9

我认为问题是在dilbert实例之后为tensorflow层调用正确的张量。因为distilbert = transformer(inputs)返回一个实例,而不是像tensorflow中那样的张量,例如pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)poolingMaxPooling1D层的输出张量。

我通过调用distilbert实例的last_hidden_state变量(即dilbert模型的输出(来解决您的问题,这将是您对下一个Conv1D层的输入。

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages
from transformers import TFDistilBertModel, DistilBertModel
import tensorflow as tf
distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")

def build_model(transformer, max_len=512):
# model = tf.keras.models.Sequential()
# Encoder
inputs = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32)
distilbert = transformer(inputs)
# Decoder
###### !!!!!! #########
conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert.last_hidden_state) 
###### !!!!!! #########        
pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
flat = tf.keras.layers.Flatten()(pooling)
fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
print(model.summary())
return model

model = build_model(distilbert_encoder)

此返回,

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 512)]             0         
_________________________________________________________________
tf_distil_bert_model (TFDist TFBaseModelOutput(last_hi 134734080 
_________________________________________________________________
conv1d (Conv1D)              (None, 503, 5)            38405     
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 251, 5)            0         
_________________________________________________________________
flatten (Flatten)            (None, 1255)              0         
_________________________________________________________________
dense (Dense)                (None, 1255)              1576280   
_________________________________________________________________
dense_1 (Dense)              (None, 1255)              1576280   
=================================================================
Total params: 137,925,045
Trainable params: 137,925,045
Non-trainable params: 0

注意:我假设您在build_model函数中是指tf.keras.layers.Input乘以layers.Input

我认为你是对的。问题似乎是Conv1D层的输入。

根据文档,outputs.last_hidden_state的形状为(batch_size、sequence_length、hidden_size(
Conv1D需要一个shape(batch_size,sequence_length(的输入
也许您可以通过将Conv1D更改为Conv2D或在两者之间添加Conv2D层来解决问题。

相关内容

  • 没有找到相关文章

最新更新