为什么Tensorflow的sampled_softmax_loss强迫你使用偏见，而专家建议不要对Word2Vec使用偏见？

我所看到的Word2Vec的所有TensorFlow实现都有偏见的偏置SOFTMAX函数，包括在官方TensorFlow网站上

https://www.tensorflow.org/tutorials/word2vec#vector-prementations-of-words

loss = tf.reduce_mean(
  tf.nn.nce_loss(weights=nce_weights,
                 biases=nce_biases,
                 labels=train_labels,
                 inputs=embed,
                 num_sampled=num_sampled,
                 num_classes=vocabulary_size))

这是从Google的免费深度学习课程中https://github.com/tensorflow/tensorflow/blob/master/master/tensorflow/examples/udace/5_word2vec.ipynb

 loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed,
                               labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

然而，从安德鲁·诺格(Andrew Ng(和理查德·索切尔(Richard Socher(的演讲中，它们并不包含偏见的偏见。

即使是这个想法起源的地方，米科洛夫指出：

偏差不在神经网络中使用，因为没有显着观察到性能的提高 - 在Occam的剃须刀之后，该解决方案非常简单。

Mikolov，t。：基于神经网络的统计语言模型，p。29http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf

那么，为什么官方的TensorFlow实现具有偏差，为什么似乎没有一个选择不包含偏见的sampled_softmax_loss函数？

您链接的练习将 softmax_biases定义为零：

softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))

那就是：他们没有在其word2vec示例中使用任何实际偏见。

sampled_softmax_loss()功能是通用的，用于许多神经网络；它要求需要biases参数的决定与一个特定的神经网络应用程序(Word2Vec(的最佳状态无关，并通过允许(如此处(所有零来适应2VEC案例。

相关内容

最新更新

热门标签：