我有两个数据帧,它们的日期和客户端id相同,但数量不同。
当dfA不存在时,我试图获得另一个具有dfA数量值的数据帧,并在dfB上保留另一个0
dfA:
client_id date amount
0 1 2020-07-11 100
1 1 2020-07-10 90
2 1 2020-07-09 80
3 1 2020-07-12 70
3 1 2020-07-01 86
dfB:
client_id date amount
0 1 2020-07-11 0
1 1 2020-07-10 0
2 1 2020-07-09 0
3 1 2020-07-07 0
4 1 2020-07-06 0
5 1 2020-07-05 0
5 1 2020-07-04 0
3 1 2020-07-03 0
4 1 2020-07-02 0
5 1 2020-07-01 0
我想得到:
dfResult:
client_id date amount
0 1 2020-07-11 100
1 1 2020-07-10 90
2 1 2020-07-09 80
3 1 2020-07-07 70
4 1 2020-07-06 0
5 1 2020-07-05 0
5 1 2020-07-04 0
3 1 2020-07-03 0
4 1 2020-07-02 0
5 1 2020-07-01 86
您可以将dfconcat
放在一起,按数量排序,然后删除重复项。
dfResult = pd.concat([dfA,dfB]).sort_values(by='amout',ascending = False).drop_duplicates(subset=['client_id','date'],keep='first').reset_index().sort_values(by=['client id','date'],ascending = (True,False))
试试这个,
(
dfB.date.map(
dfA.set_index('date')['amount'].to_dict()
).fillna(0.0)
)
或
(
dfB.merge(
dfA, on=['client_id', 'date'], suffixes=("_x", ""), how='left'
).fillna(0.0).drop(columns=["amount_x"])
)
client_id date amount
0 1 2020-07-11 100.0
1 1 2020-07-10 90.0
2 1 2020-07-09 80.0
3 1 2020-07-07 0.0
4 1 2020-07-06 0.0
5 1 2020-07-05 0.0
5 1 2020-07-04 0.0
3 1 2020-07-03 0.0
4 1 2020-07-02 0.0
5 1 2020-07-01 86.0