我正在使用多响应数据集来构建一些使用python panda的频率表。 这是我的数据集:
Student Id |1st_Lang |2nd_Lang |Core_Sub_1 |Core_Sub_2 |Core_Sub_3 |Additional
1 |Bengali |English |Math |Life Sc |Physical Sc |Work Education
2 |Bengali |English |Geography |Life Sc |Physical Sc |Physical Education
3 |Bengali |English |History |Geography |Economics |Life Sc
4 |English |Hindi |History |Geography |Economics |Life Sc
5 |Hindi |English |Math |Life Sc |Physical Sc |Work Education
具有学生ID和他们选择作为语言,核心和附加的不同科目的示例学生数据。
我想生成学生正在学习科目的频率
例:
English - 5
Bengali - 3
Hindi - 2
Geography - 3
... etc.
我还想了解学生学习的语言是英语或印地语(来自1st_lang,2nd_Lang列(的学生正在学习的科目的频率。
请问你能帮忙使用Python来完成它吗?
因为我们不需要它,所以我们把"学生ID"作为索引放在一边(或删除它(:
df= df.set_index("Student Id")
#df= df.drop(columns=""Student Id")
1st_Lang 2nd_Lang Core_Sub_1 Core_Sub_2 Core_Sub_3 Additional
Student Id
1 Bengali English Math Life Sc Physical Sc Work Education
2 Bengali English Geography Life Sc Physical Sc Physical Education
3 Bengali English History Geography Economics Life Sc
4 English Hindi History Geography Economics Life Sc
5 Hindi English Math Life Sc Physical Sc Work Education
堆叠 df,我们得到了一个系列(带有 MultiIndex(:
ser= df.stack()
Student Id
1 1st_Lang Bengali
2nd_Lang English
Core_Sub_1 Math
Core_Sub_2 Life Sc
Core_Sub_3 Physical Sc
Additional Work Education
2 1st_Lang Bengali
2nd_Lang English
Core_Sub_1 Geography
Core_Sub_2 Life Sc
Core_Sub_3 Physical Sc
Additional Physical Education
3 1st_Lang Bengali
2nd_Lang English
Core_Sub_1 History
Core_Sub_2 Geography
Core_Sub_3 Economics
Additional Life Sc
4 1st_Lang English
2nd_Lang Hindi
Core_Sub_1 History
Core_Sub_2 Geography
Core_Sub_3 Economics
Additional Life Sc
5 1st_Lang Hindi
2nd_Lang English
Core_Sub_1 Math
Core_Sub_2 Life Sc
Core_Sub_3 Physical Sc
Additional Work Education
dtype: object
我们现在可以计算频率:
ser.value_counts()
Life Sc 5
English 5
Physical Sc 3
Bengali 3
Geography 3
Work Education 2
Hindi 2
Math 2
History 2
Economics 2
Physical Education 1
dtype: int64
现在看看印地语学习的学生,设定标准:
critH= df[["1st_Lang","2nd_Lang"]].eq("Hindi")
1st_Lang 2nd_Lang
Student Id
1 False False
2 False False
3 False False
4 False True
5 True False
我们将印地语视为第一语言和第二语言:
critH=critH.any(axis=1)
Student Id
1 False
2 False
3 False
4 True
5 True
dtype: bool
选择匹配的行(学生(并一步计算频率:
df.loc[critH].stack().value_counts()
Life Sc 2
Hindi 2
English 2
History 1
Work Education 1
Math 1
Economics 1
Physical Sc 1
Geography 1
dtype: int64