如何计算百分比和格式表

我有一个表，看起来像:

c_id  soap_spend  towel_spend  year_spend cluster
c1     1           2              3          1
c2     2           4              6          2
c3     1           2              3          2
c4     3           2              5          1

我想返回两个表。

表1:

cluster_1          cluster_2
% soap_spend     a = (1+3)/(3+5)   c= (2+1)/(6+3) -- soap_spend.sum/year_spend.sum 
% towel_spend    b = (2+2)/(3+5)   d= (2+4)/(6+3) -- towel_spend.sum/year_spend.sum

表2:

use results from table 1
cluster_1          cluster_2
% soap_spend     a/mean(soap_spend)   c/mean(towel_spend)
% towel_spend    b/mean(soap_spend)  d/mean(towel_spend)

我代码:

cols = ['soap_spend', 'towel_spend']
df.groupby('cluster').apply(df[col].sum()/df['year_spend'].sum()

对于谁来修复代码有什么建议吗?

您不需要按集群分组，使用loc操作进行求和就足够了:

numerator = df['soap_spend'].loc[df['cluster'] == val].sum()
denominator = df['year_spend'].loc[df['cluster'] == val].sum()

完整的代码应该是这样的:

for val in df.cluster.unique():
soap_numerator = df['soap_spend'].loc[df['cluster'] == val].sum()
denominator = df['year_spend'].loc[df['cluster'] == val].sum()
towel_numerator = df['towel_spend'].loc[df['cluster'] == val].sum()
soap_spend = soap_numerator / denominator
towel_spend = towel_numerator / denominator
col = [soap_spend,towel_spend]
df_results.insert(int(val),'cluster_{}'.format(int(val)),col)

df_results = df_results[['cluster_1','cluster_2']]

对于表2，您应该应用

table_2 = table_1.copy()
for row in range(table_2.shape[0]):
for col in table_2.columns:
table_2[col].iloc[row] = table_2[col][row] / table_2.iloc[row].mean()

相关内容

最新更新

热门标签：