Pandas:包含多个相关行的索引



考虑market_ticker数据帧:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Ticker': ['EWZ US 05/29/20 P27', 'HSI US 12/30/20 C24800', 'TLT US 06/19/20 C225', 'EWZ US 05/29/20 P27'],
                   'Market': ['US NYSE', 'US NYSE', 'HK HKSE', 'US NYSE']})
df['Reduced_Ticker'] = df['Ticker'].apply(lambda a :" ".join(a.split(" ", 2)[:2]))
market_ticker = df[['Market','Reduced_Ticker']].groupby(['Market']).agg(list)
market_ticker['Reduced_Ticker'] = market_ticker['Reduced_Ticker'].apply(lambda x: pd.unique(x))
market_ticker

如何将索引列表中的每个项转换为与索引本身相关的一行?输出预期:

Market  |  Reduced_Ticker
HK SE   |     TLT US
_________________________       
              
US NYSE |  EWZ US
        |  HSI US
        

尝试爆炸:

market_ticker.explode('Reduce_Ticker')
    Reduced_Ticker
Market  
HK HKSE TLT US
US NYSE EWZ US
US NYSE HSI US

编辑

实际上你只需要在原来的解决方案上再走一步:在Reduced_Ticker柱上爆炸:

market_ticker = market_ticker.explode('Reduced_Ticker')

如果你愿意,你仍然可以参考我下面的解决方案:

首先,通过对列Ticker的字符串进行切片,创建一个名为Reduced_Ticker的新列。然后只选择你想要的列(Reduced_Ticker and Market),然后删除重复项。

df_out = (
    df
    .assign(Reduced_Ticker = df.Ticker.str[:6])
    [['Market','Reduced_Ticker']]
    .drop_duplicates()
)

df_out:

Market  Reduced_Ticker
US NYSE EWZ US
US NYSE HSI US
HK HKSE TLT US

最新更新