刚开始张量流
处理 imdb 数据集。过程:使用文本矢量化层进行文本编码并将其传递给嵌入层:
# Create a custom standardization function to strip HTML break tags '<br />'.
def custom_standardization(input_data):
lowercase = tf.strings.lower(input_data)
stripped_html = tf.strings.regex_replace(lowercase, '<br />', ' ')
return tf.strings.regex_replace(stripped_html,
'[%s]' % re.escape(string.punctuation), '')
# Vocabulary size and number of words in a sequence.
vocab_size = 10000
sequence_length = 100
# Use the text vectorization layer to normalize, split, and map strings to
# integers. Note that the layer uses the custom standardization defined above.
# Set maximum_sequence length as all samples are not of the same length.
vectorize_layer = TextVectorization(
standardize=custom_standardization,
max_tokens=vocab_size,
output_mode='int',
output_sequence_length=sequence_length)
# Make a text-only dataset (no labels) and call adapt to build the vocabulary.
text_ds = train_ds.map(lambda x, y: x)
vectorize_layer.adapt(text_ds)
然后,我尝试构建一个功能性 API:
embedding_dim=16
text_model_catprocess2 = vectorize_layer
text_model_embedd = tf.keras.layers.Embedding(vocab_size, embedding_dim, name = 'embedding')(text_model_catprocess2)
text_model_embed_proc = tf.keras.layers.Lambda(embedding_mean_standard)(text_model_embedd)
text_model_dense1 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_embed_proc)
text_model_dense2 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_dense1)
text_model_output = tf.keras.layers.Dense(1, activation = 'sigmoid')(text_model_dense2)
但是,这会给出以下错误:
~anaconda3libsite-packageskerasbackend.py in dtype(x)
1496
1497 """
-> 1498 return x.dtype.base_dtype.name
1499
1500
AttributeError: Exception encountered when calling layer "embedding" (type Embedding).
'str' object has no attribute 'base_dtype'
Call arguments received:
• inputs=<keras.layers.preprocessing.text_vectorization.TextVectorization object at 0x0000029B483AADC0>
在制作这样的顺序 API 后,它工作正常:
embedding_dim=16
modelcheck = tf.keras.Sequential([
vectorize_layer,
tf.keras.layers.Embedding(vocab_size, embedding_dim, name="embedding"),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(1)
])
我不确定为什么会发生这种情况。函数式 API 是否需要具有输入?请帮忙!
您有两个选择。要么您使用Sequential
模型,它将像您确认的那样工作,因为您不必定义Input
层,要么您使用functional
API,您必须在其中定义Input
层:
embedding_dim = 16
text_model_input = tf.keras.layers.Input(dtype=tf.string, shape=(1,))
text_model_catprocess2 = vectorize_layer(text_model_input)
text_model_embedd = tf.keras.layers.Embedding(vocab_size, embedding_dim, name = 'embedding')(text_model_catprocess2)
text_model_embed_proc = tf.keras.layers.Lambda(embedding_mean_standard)(text_model_embedd)
text_model_dense1 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_embed_proc)
text_model_dense2 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_dense1)
text_model_output = tf.keras.layers.Dense(1, activation = 'sigmoid')(text_model_dense2)
model = tf.keras.Model(text_model_input, text_model_output)