以下是我迄今为止所做的尝试

1：每个训练模型加载`GoogleNews-vectors-negative300.bin`：

model = Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
print "model loaded..."

2：使用推文中所有单词向量的平均值构建训练集的单词向量，然后缩放

def buildWordVector(text, size):
vec = np.zeros(size).reshape((1, size))
count = 0.
for word in text:
    try:
        vec += model[word].reshape((1, size))
        count += 1.
        #print "found! ",  word
    except KeyError:
        print "not found! ",  word #missing words
        continue
if count != 0:
    vec /= count
return vec
trained_vecs = np.concatenate([buildWordVector(z, n_dim) for z in x_train])

请告诉我们如何在预先训练的Word2vec模型中添加新词？

编辑2019/06/07

正如@Oleg Melnikov和https://rare-technologies.com/word2vec-tutorial/#online_training__resuming，如果没有vocab树（在使用C代码的训练完成后不会保存），就不可能恢复训练

请注意，使用C工具load_word2vec_format（）生成的模型无法恢复训练。您仍然可以使用它们进行查询/相似性，但缺少对训练至关重要的信息（vocab树）。

获取预先训练的矢量-例如谷歌新闻
在gensim 中加载模型
在gensim 中继续训练模型

这些命令可能会派上用场

# Loading pre-trained vectors
model = Word2Vec.load_word2vec_format('/tmp/vectors.bin', binary=True)
# Training the model with list of sentences (with 4 CPU cores)
model.train(sentences, workers=4)

如何在GoogleNews-vectors-negative300.bin预训练模型中添加缺失单词向量

以下是我迄今为止所做的尝试

1：每个训练模型加载`GoogleNews-vectors-negative300.bin`：

2：使用推文中所有单词向量的平均值构建训练集的单词向量，然后缩放

相关内容

最新更新

热门标签：

如何在GoogleNews-vectors-negative300.bin预训练模型中添加缺失单词向量

以下是我迄今为止所做的尝试

1：每个训练模型加载GoogleNews-vectors-negative300.bin：

2：使用推文中所有单词向量的平均值构建训练集的单词向量，然后缩放

相关内容

最新更新

热门标签：

1：每个训练模型加载`GoogleNews-vectors-negative300.bin`：