Pandas数据框-求和前一列中与特定条件匹配的所有值,并将其添加到新列中



我可能遗漏了一些东西,但我无法找到解决方案。python中有没有一种方法可以为满足特定条件的新列添加值。在Excel中,我会在新列中应用以下公式,并将其粘贴到下方

=SUMIF(A1:C1, ">0")
输出0.50.7-0.9<1.2>0.3-0.7td>0.3-0.5-0.7=0.9

pandas可以开箱即用地处理这一问题,比如:

import pandas as pd
df = pd.DataFrame([[0.5,.7,-.9],[0.3,-.7,None],[-0.5,-.7,-.9]], columns=['val1','val2','val3'])
df['output'] = df[df>0].sum(axis=1)

另一种方式,有点类似于SUMIF:

# this is the "IF"
is_positive = df.loc[:, "val1": "val3"] > 0
# this is selecting the parts where condition holds & sums
df["output"] = df.loc[:, "val1": "val3"][is_positive].sum(axis=1)

其中最后一行中的axis=1是沿着行求和,

获取

>>> df
val1  val2  val3  output
0   0.5   0.7  -0.9     1.2
1   0.3  -0.7   NaN     0.3
2  -0.5  -0.7  -0.9     0.0

sum:之前使用DataFrame.clip

df['total'] = df[['val1','val2','val3']].clip(lower=0).sum(axis=1)
#solution by Nk03 from comments
cols = ['val1','val2','val3']
df['total'] = df[cols].mask(df[cols]<0).sum(axis=1)

编辑:通过另一列测试另一个掩码,将它们转换为numpy数组:

df['total'] = df.loc[:, "D":"F"].mask(df.loc[:, "A":"C"].to_numpy() == 'Y', 0).sum(axis=1)

您可以通过以下方式完成:

df["total"] = df.apply(lambda x: sum(x), axis=1).where((df['A'] > 0) & (df['B'] > 0) & (another_condition) & (another_condition), 0)

请注意,代码将一次对所有列求和
要获取特定列的总和,可以执行以下操作:

df['total'] = df[['A','B','C','D','E']].sum(axis=1).where((df['A'] > 0) & (df['B'] > 0) & (another_condition) & (another_condition), 0)

最新更新