绘图没有停止语的单词云

我希望使用pandas数据帧中的一列绘制Wordcloud

这是我的代码：

all_words=''.join(  [tweet for tweet in tweet_table['tokens'] ] ) 
word_Cloud=WordCloud(width=500, height=300, random_state=21, max_font_size=119).generate(all_words)
plt.imshow(word_Cloud, interpolation='bilinear')

我想要绘制的列tweet_table['tokens']如下所示：

0        [da, trumpanzee, follower, blm, balance, wp, g...
1        [counting, blacklivesmatter, received, trainin...
2        [okay, like, little, kids, pretty, smart, know...
3        [thank, oscopelabs, got, mounted, loud, amp, p...
4        [bpi, proud, supported, hoops, 4l, f, e, see, ...
...                        
44713    [tomorrow, buy, charity, compilation, undergro...
44714    [needs, erected, state, capitol, think, darkfa...
44715    [clay, county, sheriffs, motto, screw, amp, in...
44716    [films, eleven, films, bravo, bad, ass, video,...
44717                       [everybody, give, listen, blm]

我上面的代码给了我以下错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-227-4066d6d1a153> in <module>
2 # REMOVE STOP WORDS
3 
----> 4 all_words=''.join(  [tweet for tweet in tweet_table['tokens'] ] )

TypeError: sequence item 0: expected str instance, list found

请问我怎样才能纠正这个错误？列tweet_table['token']是tokenized并且从任何stopwords中清除

非常感谢

Ps：当我对tweet_table['clean_text']这一列使用类似的代码时，代码工作得很好。

列tweet_table['clean_text']如下所示：

0            You have a da trumpanzee follower in      ...
1          Over 279  and counting   If  BlackLivesMatte...
2        Okay but like little kids are pretty smart and...
3        Thank you oscopelabs  got it mounted loud  amp...
4        BPI is proud to have supported Hoops4L Y F E  ...
...                        
44713    TOMORROW you can buy the   charity compilation...
44714        That needs to be erected at the State Capi...
44715      Clay County Sheriffs  Motto  To Screw  amp  ...
44716      Films Eleven Films bravo         Bad ass vid...
44717              everybody should give this a listen ...

我刚刚修复了

allwords=''.join( str(tweet_table['tokens']))
word_Cloud=WordCloud(width=500, height=300, random_state=21,
max_font_size=119).generate(allwords)
plt.imshow(word_Cloud, interpolation='bilinear')

其中CCD_ 7没有任何停止字。否则，我们创建一个停止语列表，并将其添加为下面的代码

from wordcloud import WordCloud,STOPWORDS
stopwords_newlist = ["https", "co"] + list(STOPWORDS)
allwords=''.join( str(tweet_table['tokens']))
word_Cloud=WordCloud(width=500, height=300, random_state=21, stopwords=stopwords_newlist,
max_font_size=119).generate(allwords)

plt.imshow(word_Cloud, interpolation='bilinear')

相关内容

最新更新

热门标签：