如何在pandas中使用带过滤器的groupby ?



我有一个学生表。我们如何找到只有1名学生通过考试的人数?成功通过-获得40分或以上。

示例表

student exam   score
123     Math   42
123     IT     39
321     Math   12
321     IT     11
333     IT     66
333     Math   77

对于这个例子:

count of students = 1 #ans
student 123 has 1 succeeded passed exams
student 333 has 2 succeeded passed exams
student 321 0 exams passed

使用groupby()但无法想象filter()

我想这就是你想要的:

df_grouped = df.groupby("student")["score"].agg(lambda x: (x > 40).sum())
df_grouped[df_grouped == 1]

如果您只想要计数:只需使用len(df_grouped[df_grouped == 1])

代码:

import pandas as pd
data={
"student":[123, 123, 321, 321, 333, 333],
"exam":["Math", "IT", "Math", "IT", "IT", "Math"],
"score":[42, 39, 12, 11, 66, 77],
}
df=pd.DataFrame(data)
# groupby() student and count the number of exams with [score>=40]
num_passed_exams=df[df['score']>=40].groupby('student').size()
print(num_passed_exams)
res=len(num_passed_exams[num_passed_exams==1]) 
print("Student who passed at exactly one exam:",res) 

输出:

student
123    1
333    2
dtype: int64
Student who passed at exactly one exam: 1

你可以做

out = (df['score'].ge(40)             # Is the score greater and equal than 40?
.groupby(df['student']).sum()  # For each student, how many exams does he pass? (greater and equal than 40?)
.eq(1).sum())                  # How many students only pass 1 exam?
print(out)
1

如果您想获得只通过一次考试的学生id

out = (df['score'].ge(40)
.groupby(df['student']).sum()
.eq(1).loc[lambda s: s].index.tolist())
print(out)
[123]

如果您想查找学生的考试和成绩,您可以使用groupby.filter

out = df.groupby('student').filter(lambda g: g['score'].ge(40).sum() == 1)
student  exam  score
0      123  Math     42
1      123    IT     39

相关内容

  • 没有找到相关文章

最新更新