hi我有一个数据集,看起来很像下面的数据框架:
#Table1 :
print("Table1: Current Table")
data = [['ALFA', 35, 47, 67, 44, 193],
['Bravo', 51, 52, 16, 8, 127],
['Charlie', 59, 75, 2, 14, 150],
['Delta', 59, 75, 2, 34, 170],
['Echo', 59, 75, 2, 14, 150],
['Foxtrot', 40, 43, 26, 27, 136],
['Golf', 35, 31, 22, 13, 101],
['Hotel', 89, 58, 24, 34, 205]]
df = pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Total'])
#df.loc[:,'Total'] = df.sum(axis=1)
print(df)
我想得到所有单元格相对于其行总数的百分比(在"总计"列中计算(,使其看起来如下:
#Table2 :
print('Table2: Expected Outcome')
data2 = [['ALFA',18.1, 24.4, 34.7, 22.8, 193],
['Bravo',40.2, 40.9, 12.6, 6.3, 127],
['Charlie',39.3, 50.0, 1.3, 9.3, 150],
['Delta',34.7, 44.1, 1.2, 20.0, 170],
['Echo',39.3, 50.0, 1.3, 9.3, 150],
['Foxtrot',29.4, 31.6, 19.1, 19.9, 136],
['Hotel',34.7, 30.7, 21.8, 12.9, 101],
['Golf',43.4, 28.3, 11.7, 16.6, 205]]
df2 = pd.DataFrame(data2, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Total']) #.round(decimals=1)
#df.loc[:,'Total'] = df.sum(axis=1)
print(df2)
如果总列确实发生了更改、重新计算或在此过程中必须删除,我并不真正感兴趣;但为了完整起见,最好有一个"总计"列和单元格的百分比
通过DataFrame.div
:使用不带Objects/Total
的所有列的快速矢量化除法
c = df.columns.difference(['Objects','Total'])
df[c] = df[c].div(df['Total'], axis=0).mul(100)
print(df)
Objects Column1 Column2 Column3 Column4 Total
0 ALFA 18.134715 24.352332 34.715026 22.797927 193
1 Bravo 40.157480 40.944882 12.598425 6.299213 127
2 Charlie 39.333333 50.000000 1.333333 9.333333 150
3 Delta 34.705882 44.117647 1.176471 20.000000 170
4 Echo 39.333333 50.000000 1.333333 9.333333 150
5 Foxtrot 29.411765 31.617647 19.117647 19.852941 136
6 Golf 34.653465 30.693069 21.782178 12.871287 101
7 Hotel 43.414634 28.292683 11.707317 16.585366 205
您可以尝试使用apply:
df[['Column1', 'Column2', 'Column3', 'Column4']] = df[['Column1', 'Column2', 'Column3', 'Column4']].apply(lambda x: x/x.sum(), axis=1)
输出:
Table1: Current Table
Objects Column1 Column2 Column3 Column4 Total
0 ALFA 0.181347 0.243523 0.347150 0.227979 193
1 Bravo 0.401575 0.409449 0.125984 0.062992 127
2 Charlie 0.393333 0.500000 0.013333 0.093333 150
3 Delta 0.347059 0.441176 0.011765 0.200000 170
4 Echo 0.393333 0.500000 0.013333 0.093333 150
5 Foxtrot 0.294118 0.316176 0.191176 0.198529 136
6 Golf 0.346535 0.306931 0.217822 0.128713 101
7 Hotel 0.434146 0.282927 0.117073 0.165854 205
使用相同的数据创建一个新的数据帧。使用df.column[1:-1]循环遍历数据帧中除最后一列(即Total(之外的所有列,并计算百分比。
df1=pd.DataFrame(data, columns= ['Objects', 'Column1', 'Column2', 'Column3', 'Column4', 'Total'])
for col in df.columns[1:-1]:
df1[col]=(df[col]*100/df.Total)
df1