根据另一个数据帧中的字符串计数分布值



我想根据以下状态分配付款:

付款:

cust_id    name       date  amount
0       A  Edward 2021-01-01    3000
1       B   Henry 2021-01-01    5000
2       C   Ferth 2021-02-01    1000

状态:

cust_id  contract_id   state1    state2    state3
0       A            1  Alabama    Alaska   Arizona
1       A            2  Indiana   Alabama  Nebraska
2       B            3  Alabama       NaN   Arizona
3       C            4   Alaska  Nebraska       NaN
4       C            5      NaN     Maine  Nebraska

客户可能至少有一份合同,每份合同涵盖不同的状态。每个州都必须被计数,发生两次的州将在计算比率时被计数两次,以此类推。然后,比率将乘以金额,得到每个州的分配金额。

输出:

cust_id    name       date     state     ratio  amount
0       A  Edward 2021-01-01   Alabama  0.333333    1000
1       A  Edward 2021-01-01    Alaska  0.166667     500
2       A  Edward 2021-01-01   Arizona  0.166667     500
3       A  Edward 2021-01-01   Indiana  0.166667     500
4       A  Edward 2021-01-01  Nebraska  0.166667     500
5       B   Henry 2021-01-01   Alabama  0.500000    2500
6       B   Henry 2021-01-01   Arizona  0.500000    2500
7       C   Ferth 2021-02-01    Alaska  0.250000     250
8       C   Ferth 2021-02-01  Nebraska  0.500000     500
9       C   Ferth 2021-02-01     Maine  0.250000     250

这可以使用df.meltdf.groupby以及value_countsnormalize=True来实现,这样我们就可以使每个客户的状态变平,并根据出现次数获得每个状态的pct份额。然后与支付数据帧合并,最后将amount与pct份额相乘,得到新的金额:

解决方案:

u = (state.melt(['cust_id','contract_id'],value_name='state')
.groupby("cust_id")['state'].value_counts(normalize=True)
.reset_index(name='ratio'))
out = payment.merge(u,on='cust_id')
out['new_amount'] = out['amount']*out['ratio']

输出:

print(out)
cust_id    name        date  amount     state     ratio  new_amount
0       A  Edward  2021-01-01    3000   Alabama  0.333333      1000.0
1       A  Edward  2021-01-01    3000    Alaska  0.166667       500.0
2       A  Edward  2021-01-01    3000   Arizona  0.166667       500.0
3       A  Edward  2021-01-01    3000   Indiana  0.166667       500.0
4       A  Edward  2021-01-01    3000  Nebraska  0.166667       500.0
5       B   Henry  2021-01-01    5000   Alabama  0.500000      2500.0
6       B   Henry  2021-01-01    5000   Arizona  0.500000      2500.0
7       C   Ferth  2021-02-01    1000    Alaska  0.250000       250.0
8       C   Ferth  2021-02-01    1000     Maine  0.250000       250.0
9       C   Ferth  2021-02-01    1000  Nebraska  0.500000       500.0

最新更新