我有一个这样的DataFrame。每个唯一的交易id都有多个银行子公司,每个银行子公司都有一行包含其母公司,其中银行母公司的数量小于银行子公司的数量。
[In]:
d = {'Bank Parent': [["Barclays", "UBS", "BofA"], ["Barclays", "UBS", "BofA"], ["Barclays", "UBS", "BofA"],
["China Merchants Bank Co Ltd", "UniCredit"], ["China Merchants Bank Co Ltd", "UniCredit"],
["China Merchants Bank Co Ltd", "UniCredit"]],
'Bank Subsidiary': ["Barclays plc", "UBS AG", "BofA Securities Inc", "CMB International Capital Ltd", "UniCredit Bank AG",
"Goldman Sachs (Asia) LLC"],
"Deal id": [1, 1, 1, 2, 2, 2]}
df = pd.DataFrame(data = d)
df
[Out]:
Bank Parent Bank Subsidiary Deal id
0 [Barclays, UBS, BofA] Barclays plc 1
1 [Barclays, UBS, BofA] UBS AG 1
2 [Barclays, UBS, BofA] BofA Securities Inc 1
3 [China Merchants Bank Co Ltd, UniCredit] CMB International Capital Ltd 2
4 [China Merchants Bank Co Ltd, UniCredit] UniCredit Bank AG 2
5 [China Merchants Bank Co Ltd, UniCredit] Goldman Sachs (Asia) LLC 2
我想把";银行母公司";列,并具有所得到的DataFrame;NA";。
[Out]:
Bank Parent Bank Subsidiary Deal id
0 Barclays Barclays plc 1
1 UBS UBS AG 1
2 BofA BofA Securities Inc 1
3 China Merchants Bank Co Ltd CMB International Capital Ltd 2
4 UniCredit UniCredit Bank AG 2
5 NA Goldman Sachs (Asia) LLC 2
到目前为止,我试过熊猫。方法,但是我没有得到想要的结果。
看起来您想要选择Parent"对角";来自同一组。如果是这种情况,我们首先创建一个列idx
,它告诉Parent在组中的位置:
df['idx'] = df.groupby(df['Bank Parent'].apply('_'.join), group_keys = False).cumcount()
然后我们一行接一行地从"Bank Parent"中的列表中选择相关元素,并粘贴到gew列"Parent":
df['Parent'] = df.apply(lambda r: r['Bank Parent'][r['idx']], axis=1)
df
现在看起来是这样的:
Bank Parent Bank Subsidiary idx Parent
-- -------------------------------------------- ----------------------------- ----- ---------------------------
0 ['Barclays', 'UBS', 'BofA'] Barclays plc 0 Barclays
1 ['Barclays', 'UBS', 'BofA'] UBS AG 1 UBS
2 ['Barclays', 'UBS', 'BofA'] BofA Securities Inc 2 BofA
3 ['China Merchants Bank Co Ltd', 'UniCredit'] CMB International Capital Ltd 0 China Merchants Bank Co Ltd
4 ['China Merchants Bank Co Ltd', 'UniCredit'] UniCredit Bank AG 1 UniCredit
您可以进一步df.drop(...)
不再需要的列
编辑
如果像在修改后的问题中一样,我们有一个太短的母银行列表,我们可以用以下内容替换解决方案的第二行,其中我们基本上用NaNs填充列表:
df['Parent'] = df.apply(lambda r: (r['Bank Parent'] + [None]*r['idx'])[r['idx']], axis=1)
带有修订df:的输出
Bank Parent Bank Subsidiary Deal id idx Parent
-- -------------------------------------------- ----------------------------- --------- ----- ---------------------------
0 ['Barclays', 'UBS', 'BofA'] Barclays plc 1 0 Barclays
1 ['Barclays', 'UBS', 'BofA'] UBS AG 1 1 UBS
2 ['Barclays', 'UBS', 'BofA'] BofA Securities Inc 1 2 BofA
3 ['China Merchants Bank Co Ltd', 'UniCredit'] CMB International Capital Ltd 2 0 China Merchants Bank Co Ltd
4 ['China Merchants Bank Co Ltd', 'UniCredit'] UniCredit Bank AG 2 1 UniCredit
5 ['China Merchants Bank Co Ltd', 'UniCredit'] Goldman Sachs (Asia) LLC 2 2