迭代列以创建字典并创建数据框架



我正在尝试迭代一列以实现句子中每个单词的计数。

我有一个列:

words
一二三四四六
"…">
from collections import Counter
# get a list of lists with sentences
sentences = df['words'].values.tolist()
# split the sentences into the words and flatten the list
words = [i for j in sentences for i in j.split()]
# get counts of each unique word
counts = Counter(words).most_common()
# make dataframe
result = pd.DataFrame(counts , columns=['Words', 'n'])

您可以将列拆分,然后将explode列表拆分为行。最后用value_counts计算列

的词频。
out = (df['words'].str.split().explode().value_counts()
.to_frame().reset_index().rename(columns={'index': 'Words', 'words': 'n'}))
print(out)
Words  n
0       four  2
1        one  1
2        two  1
3      three  1
4        six  1
5      seven  1
6      eight  1
7       nine  1
8        ten  1
9     eleven  1
10    twelve  1
11  thirteen  1
12  fourteen  1

相关内容

  • 没有找到相关文章

最新更新