我有一个文件,用于比较底层数据集不同视图的不同信息。目标是列出信息并比较总数。
我有以下数据帧:
df = pandas.DataFrame({"Measures":
['Country','State','County','City'],
"Green": ['Included','Excluded','Included','Included'], "Orange":
['Excluded', 'Excluded', 'Excluded', 'Included']})
我有以下基础数据集:
Location Green Orange
Country 1 6
State 3 10
County 2 15
City 5 20
我希望最终结果是这样的:
Measures Green Orange
Country Included Excluded
State Excluded Excluded
County Included Excluded
City Included Included
Total 8 20
在计算总和之前,可以使用df
来屏蔽底层数据帧的值。
m = df.eq('Included')
# Assume df2 is your underlying DataFrame.
v = df2[m].sum()
# Assign the total back as a new row in df.
df.loc['Total', :] = v[df2.dtypes != object]
df
Measures Green Orange
0 Country Included Excluded
1 State Excluded Excluded
2 County Included Excluded
3 City Included Included
Total NaN 8 20
如果您想要更相同的输出,另一个选项是将"Measures"one_answers"Locations"分别设置为索引。
df = df.set_index('Measures')
df2 = df2.set_index('Location')
m = df.eq('Included')
v = df2[m].sum()
df.loc['Total', :] = v
df
Green Orange
Measures
Country Included Excluded
State Excluded Excluded
County Included Excluded
City Included Included
Total 8 20