小贝子编程

有没有办法使用 scikit 或任何其他 python 包仅获取单词的 IDF 值

本文关键字：获取包仅 python 单词 IDF 其他任何 scikit 有没有 python scikit-learn nlp tf-idf tfidfvectorizer
更新时间 : 2023-09-11
英文 : Is there a way to get only the IDF values of words using scikit or any other python package?

我的数据集中有一个文本列，使用该列，我想为存在的所有单词计算 IDF。scikit中的TFID实现，如tfidf矢量化，直接给我TFIDF值，而不仅仅是单词IDF。有没有办法让单词 IDF 提供一组文档？

您可以将 TfidfVectorizer 与 use_idf=True(默认值(一起使用，然后使用 idf_ 进行提取。

from sklearn.feature_extraction.text import TfidfVectorizer
my_data = ["hello how are you", "hello who are you", "i am not you"]
tf = TfidfVectorizer(use_idf=True)
tf.fit_transform(my_data)
idf = tf.idf_

[奖励] 如果要获取特定单词的 idf 值：

# If you want to get the idf value for a particular word, here "hello"    
tf.idf_[tf.vocabulary_["hello"]]

有没有办法使用 scikit 或任何其他 python 包仅获取单词的 IDF 值

相关内容

最新更新

热门标签：