保留两列的值按组匹配的行,否则按组保留第一行

  • 本文关键字:保留 一行 两列 python pandas numpy
  • 更新时间 :
  • 英文 :


我想在df1上左加入df2,然后保留按组匹配的行,如果没有匹配的组,那么我想保留组的第一行,以实现df3(所需的结果)。我希望你们能帮我找到最优的解决方案。

下面是我创建两个数据框和所需结果的代码。

import pandas as pd
import numpy as np
market = ['SP', 'SP', 'SP']
underlying = ['TSLA', 'GOOG', 'MSFT']

# DF1
df = pd.DataFrame(list(zip(market, underlying)),
columns=['market', 'underlying'])
market2 = ['SP', 'SP', 'SP', 'SP', 'SP']
underlying2 = [None, 'TSLA', 'GBX', 'GBM', 'GBS']
client2 = [17, 12, 100, 21, 10]
# DF2
df2 = pd.DataFrame(list(zip(market2, underlying2, client2)),
columns=['market', 'underlying', 'client'])
market3 = ['SP', 'SP', 'SP']
underlying3 = ['TSLA', 'GOOG', 'MSFT']
client3 = [12, 17, 17]
# Desired
df3 = pd.DataFrame(list(zip(market3, underlying3, client3)),
columns =['market', 'underlying', 'client'])
# This works but feels sub optimal
df3 = pd.merge(df, 
df2,
how='left',
on=['market', 'underlying'])
df3 = pd.merge(df3, 
df2,
how='left',
on=['market'])
df3 = df3.drop_duplicates(['market', 'underlying_x'])
df3['client'] = df3['client_x'].combine_first(df3['client_y'])
df3 = df3.drop(labels=['underlying_y', 'client_x', 'client_y'], axis=1)
df3 = df3.rename(columns={'underlying_x': 'underlying'})

希望你们能帮忙,非常感谢!

存储第一个值(如果市场中每个单独的值都是'SP',则可能不需要分组),合并并填充第一个值:

fill_value = df2.groupby('market').client.first()
# if you are interested in filtering for None:
fill_value = df2.set_index('market').loc[lambda df: df.underlying.isna(), 'client']

(df
.merge(
df2, 
on = ['market', 'underlying'], 
how = 'left')
.set_index('market')
.fillna({'client':fill_value}, downcast='infer')
)
underlying  client
market                   
SP           TSLA      12
SP           GOOG      17
SP           MSFT      17

最新更新