我有一个学生表。我们如何找到只有1名学生通过考试的人数?成功通过-获得40分或以上。
示例表
student exam score
123 Math 42
123 IT 39
321 Math 12
321 IT 11
333 IT 66
333 Math 77
对于这个例子:
count of students = 1 #ans
student 123 has 1 succeeded passed exams
student 333 has 2 succeeded passed exams
student 321 0 exams passed
使用groupby()
但无法想象filter()
我想这就是你想要的:
df_grouped = df.groupby("student")["score"].agg(lambda x: (x > 40).sum())
df_grouped[df_grouped == 1]
如果您只想要计数:只需使用len(df_grouped[df_grouped == 1])
代码:
import pandas as pd
data={
"student":[123, 123, 321, 321, 333, 333],
"exam":["Math", "IT", "Math", "IT", "IT", "Math"],
"score":[42, 39, 12, 11, 66, 77],
}
df=pd.DataFrame(data)
# groupby() student and count the number of exams with [score>=40]
num_passed_exams=df[df['score']>=40].groupby('student').size()
print(num_passed_exams)
res=len(num_passed_exams[num_passed_exams==1])
print("Student who passed at exactly one exam:",res)
输出:
student
123 1
333 2
dtype: int64
Student who passed at exactly one exam: 1
你可以做
out = (df['score'].ge(40) # Is the score greater and equal than 40?
.groupby(df['student']).sum() # For each student, how many exams does he pass? (greater and equal than 40?)
.eq(1).sum()) # How many students only pass 1 exam?
print(out)
1
如果您想获得只通过一次考试的学生id
out = (df['score'].ge(40)
.groupby(df['student']).sum()
.eq(1).loc[lambda s: s].index.tolist())
print(out)
[123]
如果您想查找学生的考试和成绩,您可以使用groupby.filter
out = df.groupby('student').filter(lambda g: g['score'].ge(40).sum() == 1)
student exam score
0 123 Math 42
1 123 IT 39