我有一个这样的DataFrame:
students = {'ID': [2, 3, 5, 7, 11, 13],
'Name':['John','Jane','Sam','James','Stacy','Mary'],
'Gender':['M','F','F','M','F','F'],
'school_name':['College2','College2','College10','College2','College2','College2'],
'grade':['9th','10th','9th','9th','8th','5th'],
'math_score':[90,89,88,89,89,90],
'art_score':[90,89,89,78,90,94]}
students_df = pd.DataFrame(students)
我可以在students_df上使用loc方法来选择College2九年级的所有math_scores和art_scores,并将其替换为NaN吗?有没有一种干净的方法可以在不将流程分成两部分的情况下做到这一点:一部分用于子集,另一部分用于替换?
我试着这样选择:
students_df.loc[(students_df['school_name'] == 'College2') & (students_df['grade'] == "9th"),['grade','school_name','math_score','art_score']]
我用这种方式替换:
students_df['math_score'] = np.where((students_df['school_name']=='College2') & (students_df['grade']=='9th'), np.NaN, students_df['math_score'])
使用loc和np.NaN,我能以更干净、更高效的方式实现同样的事情吗?
首先选择要替换缺失值的列,然后设置NaN
:
students_df.loc[(students_df['school_name'] == 'College2') & (students_df['grade'] == "9th"),['math_score','art_score']] = np.nan
print (students_df)
ID Name Gender school_name grade math_score art_score
0 2 John M College2 9th NaN NaN
1 3 Jane F College2 10th 89.0 89.0
2 5 Sam F College10 9th 88.0 89.0
3 7 James M College2 9th NaN NaN
4 11 Stacy F College2 8th 89.0 90.0
5 13 Mary F College2 5th 90.0 94.0