通过参数化熊猫组



有没有一种方法可以通过传递硬编码列表来参数化熊猫组?

group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"
df = pd.read_csv(input_file_name)
df_total = df.groupby([group_by_cols])[aggregate_cols].sum()

这可能吗?

如果需要传递列表,请从嵌套列表的[group_by_cols]中删除[]

#for list added []
group_by_cols = ["id","week_number"]
aggregate_cols = ["col1","col2","col3"]
print (type(group_by_cols))
<class 'list'>
df = pd.read_csv(input_file_name)
df_total = df.groupby(group_by_cols)[aggregate_cols].sum()

或者,如果输入是元组,则将它们转换为如下列表:

group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"

像传递元组一样工作:

group_by_cols = ("id","week_number")
aggregate_cols = ("col1","col2","col3")
print (type(group_by_cols))
<class 'tuple'>
df = pd.read_csv(input_file_name)
df_total = df.groupby(list(group_by_cols))[list(aggregate_cols)].sum()

样本数据测试:

df = pd.DataFrame({
'id':list('aaaabb'),
'week_number':[4,5,4,5,5,5],
'col1':[7,8,9,4,2,3],
'col2':[1,3,5,7,1,0],
'col3':[5,3,6,9,2,4],
'col4':[4,3,3,0,3,9]
})

group_by_cols = ["id","week_number"]
aggregate_cols = ["col1","col2","col3"]
df_total = df.groupby(group_by_cols)[aggregate_cols].sum()
print (df_total)
col1  col2  col3
id week_number                  
a  4              16     6    11
5              12    10    12
b  5               5     1     6

group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"
df_total = df.groupby(list(group_by_cols))[list(aggregate_cols)].sum()
print (df_total)
col1  col2  col3
id week_number                  
a  4              16     6    11
5              12    10    12
b  5               5     1     6

最新更新