小贝子编程

NLTK/Sklearn中是否有一些模块/功能，可以对文本数据进行基本分析

本文关键字：数据文本是否 Sklearn NLTK 功能模块 python text nlp scikit-learn nltk
更新时间 : 2023-09-07
英文 : Is there some module/function in NLTK/SKLearn which will do basic analysis of the text data?

我有多个文本文件，因此每行完全有一个文档。我想对文本进行基本分析，并回答以下问题：

UMIGRAMS的数量
文档的平均长度
DOC的长度SD

等。

nltk/sklearn中是否有功能？我不介意其他见解。

1）umigram的＃

from itertools import tee, izip
def bigrams(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)
with open("data.txt", 'r') as f:
    for line in f:
        words = line.strip().split()
        uni = words
        bi = bigrams(words)
        print uni
        print list(bi)

2）句子的平均长度

sents = text.split('.')
avg_len = sum(len(x.split()) for x in sents) / len(sents)

3）自己做！没有API。

NLTK/Sklearn中是否有一些模块/功能，可以对文本数据进行基本分析

相关内容

最新更新

热门标签：