Tensorflow稀疏操作需要排序索引

我正在尝试针对文档分类对Bert进行微调。

我首先对文档进行标记，生成input_ids、attention_mask和token_type_ids列表，以提供我的TFBertModel:

def tokenize_sequences(tokenizer, max_length, corpus):
input_ids = []
token_type_ids = []
attention_masks = []
for i in tqdm(range(len(corpus))):
encoded = tokenizer.encode_plus(
corpus[i], 
max_length=max_length, 
add_special_tokens=True,
padding='max_length',
truncation=True,
return_token_type_ids=True,
return_attention_mask=True,  # add attention mask to not focus on pad tokens)
)
input_ids.append(encoded["input_ids"])
attention_masks.append(encoded["attention_mask"])
token_type_ids.append(encoded["token_type_ids"])
input_ids = tf.convert_to_tensor(input_ids)
attention_masks = tf.convert_to_tensor(attention_masks)
token_type_ids = tf.convert_to_tensor(token_type_ids)
#print(input_ids.shape, attention_masks.shape, token_type_ids.shape)
return [input_ids, attention_masks, token_type_ids]

然后，我试着适应我的模型：

x_train = tokenize_sequences(tokenizer, MAXLEN, corpus_train)
model = loadBertModel()
model.fit(
x_train, y_bin_train,
epochs=N_EPOCHS,
verbose=1,
batch_size=4, 
)

我得到了这个错误：

InvalidArgumentError:indexs[3]=[1,5]出现故障。许多稀疏操作需要排序索引。使用tf.sparse.reorder创建顺序正确的副本。

我试图按照这个建议解决这个问题。我通过修改tokenize_sequences返回的input_ids, attention_masks, token_type_ids张量来实现这一点。

input_ids = tf.sparse.reorder(input_ids)
attention_masks = tf.sparse.reorder(attention_masks)
token_type_ids = tf.sparse.reorder(token_type_ids)

但随后发生了另一个错误：

类型错误：输入必须是稀疏张量。

PS:当我检查张量的类型时，我注意到它们是<class 'tensorflow.python.framework.ops.EagerTensor'>。

关于如何解决这个问题有什么想法吗？

我没有足够的分数可以评论，所以我试图通过回答来评论。。这个问题似乎和你的一样：

多类文本分类类型错误：输入必须是稀疏张量或

在我的例子中，我通过简单地使用.toarray((转换输入而不是尝试重新排序来解决类似的问题。

input_ids = input_ids.toarray()

相关内容

最新更新

热门标签：