python-pandas数据帧添加了带有筛选条件的total列



我有一个文件,用于比较底层数据集不同视图的不同信息。目标是列出信息并比较总数。

我有以下数据帧:

df = pandas.DataFrame({"Measures": 
['Country','State','County','City'], 
"Green": ['Included','Excluded','Included','Included'], "Orange": 
['Excluded', 'Excluded', 'Excluded', 'Included']})

我有以下基础数据集:

Location    Green    Orange
Country     1        6
State       3        10
County      2        15
City        5        20

我希望最终结果是这样的:

Measures    Green    Orange
Country     Included Excluded
State       Excluded Excluded
County      Included Excluded
City        Included Included
Total       8        20

在计算总和之前,可以使用df来屏蔽底层数据帧的值。

m = df.eq('Included')   
# Assume df2 is your underlying DataFrame.
v = df2[m].sum()
# Assign the total back as a new row in df.    
df.loc['Total', :] = v[df2.dtypes != object]
df
Measures     Green    Orange
0      Country  Included  Excluded
1        State  Excluded  Excluded
2       County  Included  Excluded
3         City  Included  Included
Total      NaN         8        20

如果您想要更相同的输出,另一个选项是将"Measures"one_answers"Locations"分别设置为索引。

df = df.set_index('Measures')
df2 = df2.set_index('Location')
m = df.eq('Included') 
v = df2[m].sum()
df.loc['Total', :] = v
df
Green    Orange
Measures                    
Country   Included  Excluded
State     Excluded  Excluded
County    Included  Excluded
City      Included  Included
Total            8        20

最新更新