我正在进行一个项目,该项目试图将Excel中的函数和操作移植到panda中的python。
我尝试在数据中复制许多SUMIF函数。问题是,我不认为熊猫有特别的肛门功能。我可能有一个excel表达式,比如:
=(SUMIFS('Sheetname'!BI$146:BI$282,'Sheetname'!$B$146:$B$282,$I1836))
第一个论点是需要总结的地区。第二个区域是我们检查匹配条件的范围,最后一个参数是我们要查找的特定值。
我现在正在做的是运行一个嵌套循环,它对所有行和列进行迭代,并检查第一次迭代是否找到匹配的行,而内部循环是否找到匹配列。然后将这些值相加并输入pandas函数。
类似于:
table_dict_temp是我正在填充的表table_temp是要引用的表
for i in range(len(table_dict_temp)):
cog_loss = table_temp.loc[table_temp[COLUMN OF COMPARISON]==table_dict_temp[COLUMN OF COMPARISON][i]]
for j in range(10, len(table_dict_temp.columns)):
cog_loss_temp = cog_loss[table_dict_temp.columns[j]].sum()
table_dict_temp.iloc[i,j]=cog_loss_temp
我遇到的问题是,这似乎是一种非蟒蛇的方式,也需要很多时间。任何关于如何更快地编写函数的建议都将不胜感激!
Excel示例数据:
https://support.microsoft.com/en-us/office/sumifs-function-c9e748f5-7ea7-455d-9406-611cebce642b
Quantity Sold Product Salesperson
5 Apples Tom
4 Apples Sarah
15 Artichokes Tom
3 Artichokes Sarah
22 Bananas Tom
12 Bananas Sarah
10 Carrots Tom
33 Carrots Sarah
Description
=SUMIFS(A2:A9, B2:B9, "=A*", C2:C9, "Tom")
Adds the number of products that begin with A and were sold by Tom.
It uses the wildcard character * in Criteria1, "=A*" to look for matching product names in Criteria_range1 B2:B9,
and looks for the name "Tom" in Criteria_range2 C2:C9.
It then adds the numbers in Sum_range A2:A9 that meet both conditions.
The result is 20.
=SUMIFS(A2:A9, B2:B9, "<>Bananas", C2:C9, "Tom")
Adds the number of products that aren’t bananas and are sold by Tom.
It excludes bananas by using <> in the Criteria1, "<>Bananas",
and looks for the name "Tom" in Criteria_range2 C2:C9.
It then adds the numbers in Sum_range A2:A9 that meet both conditions.
The result is 30.
蟒蛇解决方案:
import io
import pandas as pd
data_str = '''
Quantity Sold Product Salesperson
5 Apples Tom
4 Apples Sarah
15 Artichokes Tom
3 Artichokes Sarah
22 Bananas Tom
12 Bananas Sarah
10 Carrots Tom
33 Carrots Sarah
'''.strip()
df = pd.read_csv(io.StringIO(data_str), sep='t')
# =SUMIFS(A2:A9, B2:B9, "=A*", C2:C9, "Tom")
cond = True
cond &= df['Product'].str.startswith('A')
cond &= df['Salesperson'] == 'Tom'
df.loc[cond, 'Quantity Sold'].sum()
# =SUMIFS(A2:A9, B2:B9, "<>Bananas", C2:C9, "Tom")
cond = True
cond &= df['Product'] != 'Bananas'
cond &= df['Salesperson'] == 'Tom'
df.loc[cond, 'Quantity Sold'].sum()