对列求和并在满足特定条件时替换单个值

import pandas
d = {'col1': [25,20,30],
'col2': [25,20,30],
'col3': [25,20,30], 
'col4': [25,39,11]
}
df = pandas.DataFrame(data=d)

我将如何从这个数据框循环并添加 col1 + col2 + col3 + col4，如果不等于 100，则在该索引中取值执行此 col1/(col1+col2+col3+col4 并将其作为该点的新值。这样，现在当您对 col1 + col2 + col3 + col4 求和时，该索引的加起来将达到 100。

因此，例如对于索引 0，当您添加 col1 +col2 + col3 + col4 时，它等于 100，因此，转到下一个索引，但是对于索引 1，它加起来为 99，因此取 20/99 并使其成为该位置的新值，依此类推。

预期输出：


d = {'col1': [25,20/99,30/101],
'col2': [25,20/99,30/101],
'col3': [25,20/99,30/101], 
'col4': [25,39/99,11/101]
}
df = pandas.DataFrame(data=d)

下面是一个矢量化版本：

c = df.sum(1).ne(100)
vals = np.where(c[:,None],df.div(df.sum(1),axis=0),df)
new_df = pd.DataFrame(vals,index=df.index,columns=df.columns)
# for overwriting the original df , use: df[:] = vals
print(new_df)

col1      col2      col3       col4
0  25.00000  25.00000  25.00000  25.000000
1   0.20202   0.20202   0.20202   0.393939
2   0.29703   0.29703   0.29703   0.108911

这通过首先将每列生成为自己的列表来实现您想要的：

col = [d[row][i] for row in d]

然后应用您描述的过程：

if sum(col) != 100:
newcol = [n/sum(col) for n in col]

然后可以重新插入该列。最终产品：

for i in range(0, 3):
col = [d[row][i] for row in d]
if sum(col) != 100:
newcol = [n/sum(col) for n in col]
else:
newcol = col.copy()
for row in d:
d[row][i] = newcol[int(row[-1:])-1]

我最终使用这种方法来解决我的问题

for i in range(len(df)):
x = (df.loc[i,'col1']+df.loc[i,'col2']+df.loc[i,'col3']+df.loc[i,'col4'])
for j in range(0,4):
df.iloc[i,j] = (df.iloc[i,j])/(x)

相关内容

最新更新

热门标签：