我想根据以下状态分配付款:
付款:
cust_id name date amount
0 A Edward 2021-01-01 3000
1 B Henry 2021-01-01 5000
2 C Ferth 2021-02-01 1000
状态:
cust_id contract_id state1 state2 state3
0 A 1 Alabama Alaska Arizona
1 A 2 Indiana Alabama Nebraska
2 B 3 Alabama NaN Arizona
3 C 4 Alaska Nebraska NaN
4 C 5 NaN Maine Nebraska
客户可能至少有一份合同,每份合同涵盖不同的状态。每个州都必须被计数,发生两次的州将在计算比率时被计数两次,以此类推。然后,比率将乘以金额,得到每个州的分配金额。
输出:
cust_id name date state ratio amount
0 A Edward 2021-01-01 Alabama 0.333333 1000
1 A Edward 2021-01-01 Alaska 0.166667 500
2 A Edward 2021-01-01 Arizona 0.166667 500
3 A Edward 2021-01-01 Indiana 0.166667 500
4 A Edward 2021-01-01 Nebraska 0.166667 500
5 B Henry 2021-01-01 Alabama 0.500000 2500
6 B Henry 2021-01-01 Arizona 0.500000 2500
7 C Ferth 2021-02-01 Alaska 0.250000 250
8 C Ferth 2021-02-01 Nebraska 0.500000 500
9 C Ferth 2021-02-01 Maine 0.250000 250
这可以使用df.melt
和df.groupby
以及value_counts
和normalize=True
来实现,这样我们就可以使每个客户的状态变平,并根据出现次数获得每个状态的pct份额。然后与支付数据帧合并,最后将amount
与pct份额相乘,得到新的金额:
解决方案:
u = (state.melt(['cust_id','contract_id'],value_name='state')
.groupby("cust_id")['state'].value_counts(normalize=True)
.reset_index(name='ratio'))
out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']
输出:
print(out)
cust_id name date amount state ratio new_amount
0 A Edward 2021-01-01 3000 Alabama 0.333333 1000.0
1 A Edward 2021-01-01 3000 Alaska 0.166667 500.0
2 A Edward 2021-01-01 3000 Arizona 0.166667 500.0
3 A Edward 2021-01-01 3000 Indiana 0.166667 500.0
4 A Edward 2021-01-01 3000 Nebraska 0.166667 500.0
5 B Henry 2021-01-01 5000 Alabama 0.500000 2500.0
6 B Henry 2021-01-01 5000 Arizona 0.500000 2500.0
7 C Ferth 2021-02-01 1000 Alaska 0.250000 250.0
8 C Ferth 2021-02-01 1000 Maine 0.250000 250.0
9 C Ferth 2021-02-01 1000 Nebraska 0.500000 500.0