组合和替换值数据帧panda



我有两个数据帧,它们的日期和客户端id相同,但数量不同。

当dfA不存在时,我试图获得另一个具有dfA数量值的数据帧,并在dfB上保留另一个0

dfA:
client_id  date         amount
0     1        2020-07-11    100
1     1        2020-07-10    90
2     1        2020-07-09    80
3     1        2020-07-12    70
3     1        2020-07-01    86
dfB:
client_id  date         amount
0     1        2020-07-11    0
1     1        2020-07-10    0
2     1        2020-07-09    0
3     1        2020-07-07    0
4     1        2020-07-06    0
5     1        2020-07-05    0
5     1        2020-07-04    0
3     1        2020-07-03    0
4     1        2020-07-02    0
5     1        2020-07-01    0

我想得到:

dfResult:
client_id  date         amount
0     1        2020-07-11    100
1     1        2020-07-10    90
2     1        2020-07-09    80
3     1        2020-07-07    70
4     1        2020-07-06    0
5     1        2020-07-05    0
5     1        2020-07-04    0
3     1        2020-07-03    0
4     1        2020-07-02    0
5     1        2020-07-01    86

您可以将dfconcat放在一起,按数量排序,然后删除重复项。

dfResult = pd.concat([dfA,dfB]).sort_values(by='amout',ascending = False).drop_duplicates(subset=['client_id','date'],keep='first').reset_index().sort_values(by=['client id','date'],ascending = (True,False))

试试这个,

(
dfB.date.map(
dfA.set_index('date')['amount'].to_dict()
).fillna(0.0)
)

(
dfB.merge(
dfA, on=['client_id', 'date'], suffixes=("_x", ""), how='left'
).fillna(0.0).drop(columns=["amount_x"])
)

client_id        date  amount
0          1  2020-07-11  100.0
1          1  2020-07-10   90.0
2          1  2020-07-09   80.0
3          1  2020-07-07    0.0
4          1  2020-07-06    0.0
5          1  2020-07-05    0.0
5          1  2020-07-04    0.0
3          1  2020-07-03    0.0
4          1  2020-07-02    0.0
5          1  2020-07-01   86.0

最新更新