我有一个数据帧
0 2021-03-19 14:59:49+00:00 ... I only need uxy to hit 20 eod to make up for a...
1 2021-03-19 14:59:51+00:00 ... Oh this isn’t good
2 2021-03-19 14:59:51+00:00 ... lads why is my account covered in more red ink...
3 2021-03-19 14:59:51+00:00 ... I'm tempted to drop my last 800 into some stup...
4 2021-03-19 14:59:52+00:00 ... The sell offs will continue until moral improves
And i have a list
names = ['SRNE', 'CRSR', 'GME', 'AMC', 'TSLA', 'MVIS', 'SPCE']
我想检查每一行是否有这些单词,如果它们存在,我想输出在每一行中找到的单词这是我尝试的
pat = '|'.join(r"b{}b".format(x) for x in names)
df = bearish.set_index('dt')['text'].str.extractall('(' + pat + ')')[0].reset_index(name='tickers')
df1 = pd.crosstab(df['dt'], df['tickers'])
但是它给了我一个空的df数据帧谢谢你
你可以这样使用:
样本输入
import pandas as pd
d = {'index': {0: 1, 1: 2, 2: 3}, 'txt': {0: 'random text with A', 1: 'random text with B and C', 2: 'random text number A with D and E'}}
df = pd.DataFrame(d)
代码:
lst = ['A', 'B', 'C', 'D', 'E']
pat = '|'.join(r"b{}b".format(x) for x in lst)
df['found'] = df['txt'].str.findall(pat)
输出:
0 [A]
1 [B, C]
2 [A, D, E]