小贝子编程

使用正则表达式对多个列求和，以选择要求和的列

本文关键字：求和选择和的正则表达式 python pandas
更新时间 : 2023-09-20
英文 : Summing multiple columns using a regular expression to select which columns to sum

我想执行以下操作：">

test = pd.DataFrame({'A1':[1,1,1,1],
'A2':[1,2,2,1],
'A3':[1,1,1,1],
'B1':[1,1,1,1],
'B2':[pd.NA, 1,1,1]})
result = pd.DataFrame({'A': test.filter(regex='A').sum(axis=1),
'B': test.filter(regex='B').sum(axis=1)})

我想知道，当我们有更多的专栏和更多的"；regex"-匹配。

使用dict理解代替多个重复代码，如：

L = ['A','B']
df = pd.DataFrame({x: test.filter(regex=x).sum(axis=1) for x in L})

或者，如果可能的话，通过只选择第一个字母来简化解决方案使用：

df = test.groupby(lambda x: x[0], axis=1).sum()
print (df)
A    B
0  3  1.0
1  4  2.0
2  4  2.0
3  3  2.0

如果正则表达式应由|和gt连接，则所有列的子字符串都使用：

vals = test.columns.str.extract('(A|B)', expand=False)
print (vals)
Index(['A', 'A', 'A', 'B', 'B'], dtype='object')
df = test.groupby(vals, axis=1).sum()
print (df)
A    B
0  3  1.0
1  4  2.0
2  4  2.0
3  3  2.0

使用正则表达式对多个列求和，以选择要求和的列

相关内容

最新更新

热门标签：