TensorFlow单词嵌入模型+LDA传递给LatentDerrichletAllocation.fit的数据中的负值



在将生成的特征向量传递给LDA模型之前,我正尝试使用TensorFlow hub中的预训练模型,而不是频率矢量化技术进行单词嵌入。

我遵循了TensorFlow模型的步骤,但在将结果特征向量传递给LDA模型时出现了这个错误:

Negative values in data passed to LatentDirichletAllocation.fit

到目前为止,我已经实现了以下内容:

import pandas as pd
import matplotlib.pyplot as plt
import tensorflow_hub as hub
from sklearn.decomposition import LatentDirichletAllocation
embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50-with-normalization/1")
embeddings = embed(["cat is on the mat", "dog is in the fog"])
lda_model = LatentDirichletAllocation(n_components=2, max_iter=50)
lda = lda_model.fit_transform(embeddings)

我意识到print(embeddings)打印了一些负值,如下所示:

tf.Tensor(
[[ 0.16589954  0.0254965   0.1574857   0.17688066  0.02911299 -0.03092718
0.19445257 -0.05709129 -0.08631689 -0.04391516  0.13032274  0.10905275
-0.08515751  0.01056632 -0.17220995 -0.17925954  0.19556305  0.0802278
-0.03247919 -0.49176937 -0.07767699 -0.03160921 -0.13952136  0.05959712
0.06858718  0.22386682 -0.16653948  0.19412343 -0.05491862  0.10997339
-0.15811177 -0.02576607 -0.07910853 -0.258499   -0.04206644 -0.20052543
0.1705603  -0.15314153  0.0039225  -0.28694248  0.02468278  0.11069503
0.03733957  0.01433943 -0.11048374  0.11931834 -0.11552787 -0.11110869
0.02384969 -0.07074881]

但是,有解决办法吗?

由于LatentDirichletAllocationfit函数不允许使用负数组,我建议您在embeddings上应用softplus。

以下是代码片段:

import pandas as pd
import matplotlib.pyplot as plt
import tensorflow_hub as hub
from tensorflow.math import softplus
from sklearn.decomposition import LatentDirichletAllocation
embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim50-with-normalization/1")
embeddings = softplus(embed(["cat is on the mat", "dog is in the fog"]))
lda_model = LatentDirichletAllocation(n_components=2, max_iter=50)
lda = lda_model.fit_transform(embeddings)

最新更新