我有以下数据,这些数据在pandas DataFrame中转换(下面几行是直接复制粘贴的,因为我不知道如何导入它(。
{17: '200 shares ExD 2022-09-21 PD 2022-09-30 dividend GAIN.NASDAQ 15.00 USD (0.075 per share) tax -2.25 USD (-15.000%) DivCntry US USIncmCode 06',
18: '101 shares ExD 2022-09-21 PD 2022-09-30 dividend LTC.NYSE 19.19 USD (0.19 per share) tax -2.88 USD (-15.000%) DivCntry US USIncmCode 06',
19: '302 shares ExD 2022-09-29 PD 2022-10-12 dividend AGNC.NASDAQ 36.24 USD (0.12 per share) tax -5.44 USD (-15.000%) DivCntry US USIncmCode 06',
20: '92 shares ExD 2022-07-07 PD 2022-08-22 dividend BTI.NYSE 60.31 USD (0.655523 per share) tax -0.00 USD (-0.0%) DivCntry GB fee amount -0.46 USD (0.005 per share)',
21: '75 shares ExD 2022-09-14 PD 2022-10-11 dividend MO.NYSE 70.50 USD (0.94 per share) tax -10.58 USD (-15.000%) DivCntry US USIncmCode 06'}
我需要一个代码来从中提取股票代码名称。我的行在下面,但它再次收集了整个描述。有没有一种方法可以对其进行编码,使结果只包含股票代码(例如GAIN.NASDAQ, LTC.NYSE, AGNC.NASDAQ, BTI.NYSE, MO.NYSE
(?
import pandas as pd
....
description = dividends[["Description"]] # a frame dubbed "Description" with lines such as above
ticker = description[description['Description'].str.contains('.NYSE')]
print(ticker)
只需使用描述模式,如果您没有特定的股票代码列表和split
字符串:
df['ticker'] = df['description'].str.split('dividend ').str[-1].str.split().str[0]
或者使用regex
代替
df['ticker'] = df['description'].str.extract(r'(b[A-Z]w+.[A-Z]w+)')
提取股票代码的list
:
df['description'].str.extract(r'(b[A-Z]w+.[A-Z]w+)')[0].tolist()
-> ['AGNC.NASDAQ', 'BTI.NYSE', 'GAIN.NASDAQ', 'LTC.NYSE', 'MO.NYSE']
为了避免重复,请使用set()
set(df['description'].str.extract(r'(b[A-Z]w+.[A-Z]w+)')[0].tolist())
-> {'AGNC.NASDAQ', 'BTI.NYSE', 'GAIN.NASDAQ', 'LTC.NYSE', 'MO.NYSE'}
示例
这将在您的数据帧中创建一个带有ticker的附加列:
import pandas as pd
d = {17: '200 shares ExD 2022-09-21 PD 2022-09-30 dividend GAIN.NASDAQ 15.00 USD (0.075 per share) tax -2.25 USD (-15.000%) DivCntry US USIncmCode 06',
18: '101 shares ExD 2022-09-21 PD 2022-09-30 dividend LTC.NYSE 19.19 USD (0.19 per share) tax -2.88 USD (-15.000%) DivCntry US USIncmCode 06',
19: '302 shares ExD 2022-09-29 PD 2022-10-12 dividend AGNC.NASDAQ 36.24 USD (0.12 per share) tax -5.44 USD (-15.000%) DivCntry US USIncmCode 06',
20: '92 shares ExD 2022-07-07 PD 2022-08-22 dividend BTI.NYSE 60.31 USD (0.655523 per share) tax -0.00 USD (-0.0%) DivCntry GB fee amount -0.46 USD (0.005 per share)',
21: '75 shares ExD 2022-09-14 PD 2022-10-11 dividend MO.NYSE 70.50 USD (0.94 per share) tax -10.58 USD (-15.000%) DivCntry US USIncmCode 06'}
df = pd.DataFrame(d.values(), columns=['description'])
df['ticker'] = df['description'].str.extract(r'(b[A-Z]w+.[A-Z]w+)')
df[['ticker','description']]
输出
ticker | 描述 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
0 | GAIN.纳斯达克 | 1 | LTC.NNYSE | 2 | AGNC.纳斯达克 | 3 | BTI.NYSE | >td style="text-align:left|||
4 | MO.NYSE |