我有以下数据帧:
df = pd.DataFrame( columns = ['Name','Status','Profit','Promotion','Product','Visits'])
df['Name'] = ['Andy','Andy','Brad','Brad','Cynthia','Cynthia']
df['Status'] =['Old','New','Old','New','Old','New']
df['Profit'] = [140,60,110,90,20,100]
df['Promotion'] = [25,30,40,10,22,36]
df['Product'] = [8,6,18,10,7,12]
df['Visits'] = [11,4,7,3,12,5]
df['Month'] = 'Jan'
我想通过"名称"计算出"利润"、"促销"one_answers"产品"列的总额百分比,以实现以下数据帧:
df['Profit'] = [70,30,55,45,17,83]
df['Promotion'] = [45,55,80,20,38,62]
df['Product'] = [57,43,64,36,37,63]
df
我曾尝试按"姓名"、"状态"one_answers"月份"进行分组,并尝试使用groupby进行类似于此处提供的Pandas占总数百分比的解决方案,但似乎无法获得我想要的输出。
使用GroupBy.transform
计算每个Name
s的总和,将原始列除以100,最后一个round
:
cols = ['Profit','Promotion','Product']
print (df.groupby('Name')[cols].transform('sum'))
Profit Promotion Product
0 200 55 14
1 200 55 14
2 200 50 28
3 200 50 28
4 120 58 19
5 120 58 19
df[cols] = df[cols].div(df.groupby('Name')[cols].transform('sum')).mul(100).round()
print (df)
Name Status Profit Promotion Product Visits Month
0 Andy Old 70.0 45.0 57.0 11 Jan
1 Andy New 30.0 55.0 43.0 4 Jan
2 Brad Old 55.0 80.0 64.0 7 Jan
3 Brad New 45.0 20.0 36.0 3 Jan
4 Cynthia Old 17.0 38.0 37.0 12 Jan
5 Cynthia New 83.0 62.0 63.0 5 Jan