如何从熊猫数据框中提取股票代码名称



我有以下数据,这些数据在pandas DataFrame中转换(下面几行是直接复制粘贴的,因为我不知道如何导入它(。

{17: '200 shares ExD 2022-09-21 PD 2022-09-30 dividend GAIN.NASDAQ 15.00 USD (0.075 per share) tax -2.25 USD (-15.000%) DivCntry US USIncmCode 06',
18: '101 shares ExD 2022-09-21 PD 2022-09-30 dividend LTC.NYSE 19.19 USD (0.19 per share) tax -2.88 USD (-15.000%) DivCntry US USIncmCode 06',    
19: '302 shares ExD 2022-09-29 PD 2022-10-12 dividend AGNC.NASDAQ 36.24 USD (0.12 per share) tax -5.44 USD (-15.000%) DivCntry US USIncmCode 06',     
20: '92 shares ExD 2022-07-07 PD 2022-08-22 dividend BTI.NYSE 60.31 USD (0.655523 per share) tax -0.00 USD (-0.0%) DivCntry GB fee amount -0.46 USD (0.005 per share)',     
21: '75 shares ExD 2022-09-14 PD 2022-10-11 dividend MO.NYSE 70.50 USD (0.94 per share) tax -10.58 USD (-15.000%) DivCntry US USIncmCode 06'}

我需要一个代码来从中提取股票代码名称。我的行在下面,但它再次收集了整个描述。有没有一种方法可以对其进行编码,使结果只包含股票代码(例如GAIN.NASDAQ, LTC.NYSE, AGNC.NASDAQ, BTI.NYSE, MO.NYSE(?

import pandas as pd
....
description = dividends[["Description"]]      # a frame dubbed "Description" with lines such as above                                    
ticker = description[description['Description'].str.contains('.NYSE')]
print(ticker)

只需使用描述模式,如果您没有特定的股票代码列表和split字符串:

df['ticker'] = df['description'].str.split('dividend ').str[-1].str.split().str[0]

或者使用regex代替

df['ticker'] = df['description'].str.extract(r'(b[A-Z]w+.[A-Z]w+)')

提取股票代码的list

df['description'].str.extract(r'(b[A-Z]w+.[A-Z]w+)')[0].tolist()
-> ['AGNC.NASDAQ', 'BTI.NYSE', 'GAIN.NASDAQ', 'LTC.NYSE', 'MO.NYSE']

为了避免重复,请使用set()

set(df['description'].str.extract(r'(b[A-Z]w+.[A-Z]w+)')[0].tolist())
-> {'AGNC.NASDAQ', 'BTI.NYSE', 'GAIN.NASDAQ', 'LTC.NYSE', 'MO.NYSE'}

示例

这将在您的数据帧中创建一个带有ticker的附加列:

import pandas as pd
d = {17: '200 shares ExD 2022-09-21 PD 2022-09-30 dividend GAIN.NASDAQ 15.00 USD (0.075 per share) tax -2.25 USD (-15.000%) DivCntry US USIncmCode 06',
18: '101 shares ExD 2022-09-21 PD 2022-09-30 dividend LTC.NYSE 19.19 USD (0.19 per share) tax -2.88 USD (-15.000%) DivCntry US USIncmCode 06', 
19: '302 shares ExD 2022-09-29 PD 2022-10-12 dividend AGNC.NASDAQ 36.24 USD (0.12 per share) tax -5.44 USD (-15.000%) DivCntry US USIncmCode 06',
20: '92 shares ExD 2022-07-07 PD 2022-08-22 dividend BTI.NYSE 60.31 USD (0.655523 per share) tax -0.00 USD (-0.0%) DivCntry GB fee amount -0.46 USD (0.005 per share)', 
21: '75 shares ExD 2022-09-14 PD 2022-10-11 dividend MO.NYSE 70.50 USD (0.94 per share) tax -10.58 USD (-15.000%) DivCntry US USIncmCode 06'}
df = pd.DataFrame(d.values(), columns=['description'])
df['ticker'] = df['description'].str.extract(r'(b[A-Z]w+.[A-Z]w+)')
df[['ticker','description']]

输出

>td style="text-align:left
ticker描述
0GAIN.纳斯达克1LTC.NNYSE2AGNC.纳斯达克3BTI.NYSE
4MO.NYSE

最新更新