有没有办法在熊猫身上将行分割成指定数量的行



我有一个这样的DataFrame。每个唯一的交易id都有多个银行子公司,每个银行子公司都有一行包含其母公司,其中银行母公司的数量小于银行子公司的数量。

[In]: 
d = {'Bank Parent': [["Barclays", "UBS", "BofA"], ["Barclays", "UBS", "BofA"], ["Barclays", "UBS", "BofA"], 
["China Merchants Bank Co Ltd", "UniCredit"], ["China Merchants Bank Co Ltd", "UniCredit"],
["China Merchants Bank Co Ltd", "UniCredit"]], 
'Bank Subsidiary': ["Barclays plc", "UBS AG", "BofA Securities Inc", "CMB International Capital Ltd", "UniCredit Bank AG",
"Goldman Sachs (Asia) LLC"],
"Deal id": [1, 1, 1, 2, 2, 2]}
df = pd.DataFrame(data = d)
df
[Out]: 
Bank Parent Bank Subsidiary Deal id
0   [Barclays, UBS, BofA]   Barclays plc    1
1   [Barclays, UBS, BofA]   UBS AG  1
2   [Barclays, UBS, BofA]   BofA Securities Inc 1
3   [China Merchants Bank Co Ltd, UniCredit]    CMB International Capital Ltd   2
4   [China Merchants Bank Co Ltd, UniCredit]    UniCredit Bank AG   2
5   [China Merchants Bank Co Ltd, UniCredit]    Goldman Sachs (Asia) LLC    2

我想把";银行母公司";列,并具有所得到的DataFrame;NA";。

[Out]: 
Bank Parent                     Bank Subsidiary                 Deal id
0   Barclays                    Barclays plc                      1
1   UBS                         UBS AG                            1
2   BofA                        BofA Securities Inc               1
3   China Merchants Bank Co Ltd CMB International Capital Ltd     2
4   UniCredit                   UniCredit Bank AG                 2
5   NA                          Goldman Sachs (Asia) LLC          2

到目前为止,我试过熊猫。方法,但是我没有得到想要的结果。

看起来您想要选择Parent"对角";来自同一组。如果是这种情况,我们首先创建一个列idx,它告诉Parent在组中的位置:

df['idx'] = df.groupby(df['Bank Parent'].apply('_'.join), group_keys = False).cumcount()

然后我们一行接一行地从"Bank Parent"中的列表中选择相关元素,并粘贴到gew列"Parent":

df['Parent'] = df.apply(lambda r: r['Bank Parent'][r['idx']], axis=1)

df现在看起来是这样的:

Bank Parent                                   Bank Subsidiary                  idx  Parent
--  --------------------------------------------  -----------------------------  -----  ---------------------------
0  ['Barclays', 'UBS', 'BofA']                   Barclays plc                       0  Barclays
1  ['Barclays', 'UBS', 'BofA']                   UBS AG                             1  UBS
2  ['Barclays', 'UBS', 'BofA']                   BofA Securities Inc                2  BofA
3  ['China Merchants Bank Co Ltd', 'UniCredit']  CMB International Capital Ltd      0  China Merchants Bank Co Ltd
4  ['China Merchants Bank Co Ltd', 'UniCredit']  UniCredit Bank AG                  1  UniCredit

您可以进一步df.drop(...)不再需要的列

编辑

如果像在修改后的问题中一样,我们有一个太短的母银行列表,我们可以用以下内容替换解决方案的第二行,其中我们基本上用NaNs填充列表:

df['Parent'] = df.apply(lambda r: (r['Bank Parent'] + [None]*r['idx'])[r['idx']], axis=1)

带有修订df:的输出

Bank Parent                                   Bank Subsidiary                  Deal id    idx  Parent
--  --------------------------------------------  -----------------------------  ---------  -----  ---------------------------
0  ['Barclays', 'UBS', 'BofA']                   Barclays plc                           1      0  Barclays
1  ['Barclays', 'UBS', 'BofA']                   UBS AG                                 1      1  UBS
2  ['Barclays', 'UBS', 'BofA']                   BofA Securities Inc                    1      2  BofA
3  ['China Merchants Bank Co Ltd', 'UniCredit']  CMB International Capital Ltd          2      0  China Merchants Bank Co Ltd
4  ['China Merchants Bank Co Ltd', 'UniCredit']  UniCredit Bank AG                      2      1  UniCredit
5  ['China Merchants Bank Co Ltd', 'UniCredit']  Goldman Sachs (Asia) LLC               2      2

最新更新