如何从python的每行列表中提取单词



我有一个数据帧

0        2021-03-19 14:59:49+00:00  ...  I only need uxy to hit 20 eod to make up for a...
1        2021-03-19 14:59:51+00:00  ...                                 Oh this isn’t good
2        2021-03-19 14:59:51+00:00  ...  lads why is my account covered in more red ink...
3        2021-03-19 14:59:51+00:00  ...  I'm tempted to drop my last 800 into some stup...
4        2021-03-19 14:59:52+00:00  ...  The sell offs will continue until moral improves

And i have a list

names = ['SRNE', 'CRSR', 'GME', 'AMC', 'TSLA', 'MVIS', 'SPCE']

我想检查每一行是否有这些单词,如果它们存在,我想输出在每一行中找到的单词这是我尝试的

pat = '|'.join(r"b{}b".format(x) for x in names)
df = bearish.set_index('dt')['text'].str.extractall('(' + pat + ')')[0].reset_index(name='tickers')
df1 = pd.crosstab(df['dt'], df['tickers'])

但是它给了我一个空的df数据帧谢谢你

你可以这样使用:

样本输入

import pandas as pd
d = {'index': {0: 1, 1: 2, 2: 3}, 'txt': {0: 'random text with A', 1: 'random text with B and C', 2: 'random text number A with D and E'}}
df = pd.DataFrame(d)

代码:

lst = ['A', 'B', 'C', 'D', 'E']
pat = '|'.join(r"b{}b".format(x) for x in lst)
df['found'] = df['txt'].str.findall(pat)

输出:

0          [A]
1       [B, C]
2    [A, D, E]

最新更新