Groupby.sum 的熊猫规范化

我有一个熊猫数据帧，看起来像这样：

**I     SI     weights**
1     3      0.3  
2     4      0.2
1     3      0.5
1     5      0.5

我需要这样做：给定值 I，考虑 SI 的每个值并添加总权重。最后，对于每个实现，我应该有这样的东西：

I = 1     SI = 3      weight = 0.8
SI = 5      weight = 0.5
I = 2     SI = 4      weight = 0.2

这可以通过调用 groupby 和 sum 轻松实现：

name = ['I', 'SI','weight']
Location = 'Simulationsdata/prova.csv'
df = pd.read_csv(Location, names = name,sep='t',encoding='latin1') 
results = df.groupby(['I', 'real', 'SI']).weight.sum()

现在我希望将权重标准化为 1，因此它应该是这样的：

I = 1     SI = 3      weight = 0.615
SI = 5      weight = 0.385
I = 2     SI = 4      weight = 1

我试过这个：

for idx2, j in enumerate(results.index.get_level_values(1).unique()):
norm = [float(i)/sum(results.loc[j]) for i in results.loc[j]]

但是当我尝试绘制每个 I 的 SI 分布时，我发现 SI 也是归一化的，我不希望这种情况发生。

附言这个问题与这个问题有关，但由于它涉及问题的另一个方面，我认为最好单独问

您应该能够将weight列除以其自己的总和：

# example data
df
I  SI   weight
0  1   3      0.3
1  2   4      0.2
2  1   3      0.5
3  1   5      0.5
# two-level groupby, with the result as a DataFrame instead of Series:
# df['col'] gives a Series, df[['col']] gives a DF
res = df.groupby(['I', 'SI'])[['weight']].sum()
res
weight
I SI         
1 3       0.8
5       0.5
2 4       0.2
# Get the sum of weights for each value of I,
# which will serve as denominators in normalization
denom = res.groupby('I')['weight'].sum()
denom
I
1    1.3
2    0.2
Name: weight, dtype: float64
# Divide each result value by its index-matched
# denominator value
res.weight = res.weight / denom
res
weight
I SI          
1 3   0.615385
5   0.384615
2 4   1.000000

相关内容

最新更新

热门标签：