如何在python熊猫中以快速和pythonic的方式调整和使用Excel的Sumif函数?



我正在进行一个项目,该项目试图将Excel中的函数和操作移植到panda中的python。

我尝试在数据中复制许多SUMIF函数。问题是,我不认为熊猫有特别的肛门功能。我可能有一个excel表达式,比如:

=(SUMIFS('Sheetname'!BI$146:BI$282,'Sheetname'!$B$146:$B$282,$I1836))

第一个论点是需要总结的地区。第二个区域是我们检查匹配条件的范围,最后一个参数是我们要查找的特定值。

我现在正在做的是运行一个嵌套循环,它对所有行和列进行迭代,并检查第一次迭代是否找到匹配的行,而内部循环是否找到匹配列。然后将这些值相加并输入pandas函数。

类似于:

table_dict_temp是我正在填充的表table_temp是要引用的表

for i in range(len(table_dict_temp)):
cog_loss = table_temp.loc[table_temp[COLUMN OF COMPARISON]==table_dict_temp[COLUMN OF COMPARISON][i]]
for j in range(10, len(table_dict_temp.columns)):
cog_loss_temp = cog_loss[table_dict_temp.columns[j]].sum()
table_dict_temp.iloc[i,j]=cog_loss_temp

我遇到的问题是,这似乎是一种非蟒蛇的方式,也需要很多时间。任何关于如何更快地编写函数的建议都将不胜感激!

Excel示例数据:

https://support.microsoft.com/en-us/office/sumifs-function-c9e748f5-7ea7-455d-9406-611cebce642b

Quantity Sold   Product Salesperson
5   Apples  Tom
4   Apples  Sarah
15  Artichokes  Tom
3   Artichokes  Sarah
22  Bananas Tom
12  Bananas Sarah
10  Carrots Tom
33  Carrots Sarah

Description
=SUMIFS(A2:A9, B2:B9, "=A*", C2:C9, "Tom")
Adds the number of products that begin with A and were sold by Tom. 
It uses the wildcard character * in Criteria1, "=A*" to look for matching product names in Criteria_range1 B2:B9, 
and looks for the name "Tom" in Criteria_range2 C2:C9. 
It then adds the numbers in Sum_range A2:A9 that meet both conditions. 
The result is 20.
=SUMIFS(A2:A9, B2:B9, "<>Bananas", C2:C9, "Tom")
Adds the number of products that aren’t bananas and are sold by Tom. 
It excludes bananas by using <> in the Criteria1, "<>Bananas", 
and looks for the name "Tom" in Criteria_range2 C2:C9. 
It then adds the numbers in Sum_range A2:A9 that meet both conditions. 
The result is 30.

蟒蛇解决方案:

import io
import pandas as pd
data_str = '''
Quantity Sold   Product Salesperson
5   Apples  Tom
4   Apples  Sarah
15  Artichokes  Tom
3   Artichokes  Sarah
22  Bananas Tom
12  Bananas Sarah
10  Carrots Tom
33  Carrots Sarah
'''.strip()
df = pd.read_csv(io.StringIO(data_str), sep='t')
# =SUMIFS(A2:A9, B2:B9, "=A*", C2:C9, "Tom")
cond = True
cond &= df['Product'].str.startswith('A')
cond &= df['Salesperson'] == 'Tom'
df.loc[cond, 'Quantity Sold'].sum()
# =SUMIFS(A2:A9, B2:B9, "<>Bananas", C2:C9, "Tom")
cond = True
cond &= df['Product'] != 'Bananas'
cond &= df['Salesperson'] == 'Tom'
df.loc[cond, 'Quantity Sold'].sum()

最新更新