我的df
看起来像这样:
team_name text
--------- ----
red this is text from red team
blue this is text from blue team
green this is text from green team
yellow this is text from yellow team
我正试图得到这个:
team_name text text_token
--------- ---- ----------
red this is text from red team 'this', 'is', 'text', 'from', 'red','team'
blue this is text from blue team 'this', 'is', 'text', 'from', 'blue','team'
green this is text from green team 'this', 'is', 'text', 'from', 'green','team'
yellow this is text from yellow team 'this', 'is', 'text', 'from', 'yellow','team'
我试过什么?
df['text_token'] = nltk.word_tokenize(df['text'])
但这是行不通的。我如何达到我想要的结果?也可以做frequency dist
吗?
Stack overflow有几个例子供您研究。
这个问题已经在链接中解决:如何在数据帧中使用word_tokesize
df['text_token'] = df.apply(lambda row: nltk.word_tokenize(row['text']), axis=1)