很抱歉,如果这看起来是重复的,我发现了很多使用groupby和size的接近答案,但没有一个返回列标题作为索引。
我有以下df(它实际上有340列和许多行(:
import pandas as pd
data = {'Name_Clean_40_40_Correct':['0','1','0','0'], 'Name_Clean_40_80_Correct':['0','1','1','N/A'],'Name_Clean_40_60_Correct':['N/A','N/A','0','1']}
df_third = pd.DataFrame(data)
我正在尝试为每列计算"0"、"1"one_answers"N/A"的实例。所以我希望索引是列名,列是"0"、"1"one_answers"N/A"。
我试过这个,但恐怕效率很低或不正确,因为它不会完成。
def countx(x, colname):
df_thresholds=df_third.groupby(colname).count()
for col in df_thresholds.columns:
df_thresholds[col + '_Count'] = df_third.apply(countx, axis=1, args=(col,))
我可以为一个专栏做这件事,但那会很痛苦:
df_thresholds=df_third.groupby('Name_Clean_100_100_Correct').count()
df_thresholds=df_thresholds[['Name_Raw']]
df_thresholds=df_thresholds.T
如果我理解正确,这应该有效:
df_third.apply(pd.Series.value_counts)
结果:
Name_Clean_40_40_Correct ... Name_Clean_40_60_Correct
0 3.0 ... 1
1 1.0 ... 1
N/A NaN ... 2
BTW:只选择包含"正确"的列:
df_third.filter(like='Correct')
转座形式df_third.T
:
0 1 N/A
Name_Clean_40_40_Correct 3.0 1.0 NaN
Name_Clean_40_80_Correct 1.0 2.0 1.0
Name_Clean_40_60_Correct 1.0 1.0 2.0