我想从pandas数据透视表中获得计数的数据框,但对于聚合函数,要包括每个索引。例如
df1 = pd.DataFrame({"A": ["foo", "bar", "shtuff"],
"B": ["one", "two", "one"],
"C": [1, 2, 3]})
df2 = pd.DataFrame({"C": [1,2,2,1,2,1,1,2,2],
"D": ["A", "B", "A", "B", "A",
"B", "A", "C", "A"]})
df = df1.merge(df2,how='left')
pd.pivot_table(df, index='A',columns='D', values='C', aggfunc='count', fill_value=0)
将导致
D A B C
A
bar 3 1 1
foo 2 2 0
但是我想得到
D A B C
A
bar 3 1 1
foo 2 2 0
shtuff 0 0 0
我怎样才能做到这一点?我一直在看文档,但似乎找不到答案。提前感谢:)
您可以使用Categorical
:
df1['A'] = pd.Categorical(df1['A'])
df = df1.merge(df2, how='left')
pd.pivot_table(df, index='A',columns='D', values='C', aggfunc='count',
fill_value=0, observed=False)
注意您的pivot_table
相当于crosstab
:
pd.crosstab(df['A'], df['D'], dropna=False)
输出:
D A B C
A
bar 3 1 1
foo 2 2 0
shtuff 0 0 0
或者,reindex
:
pd.pivot_table(df, index='A',columns='D', values='C', aggfunc='count',
fill_value=0, observed=False
).reindex(df['A'].unique(), fill_value=0)