如何根据pandas中多个列的条件替换列中的值



当且仅当数据框的三个不同列满足三个条件时,我想替换'Risk Rating'列中的值。我使用遮罩技术,也通过。loc方法,但它不适合我。我想只做9行。我想将这个单一情况下的"风险评级"值从0替换为9。数据帧的长度为180002。下面是我写的代码:

safety.loc[((safety['Employee Name']=="Shabbir Hussain") & (safety['Employee Number']==11231) & 
(safety['Attendance Date']=="2020-03-12")),['Risk Rating']]=9
mask = (safety['Employee Name']=="Shakir Hussain") & (safety['Employee Number']==11026) & 
(safety['Attendance Date']=="2020-03-12") & (safety['Risk Rating']==0)
safety['Risk Rating'][mask]=9
mask = (safety['Employee Name']=="Shakir Hussain") & 
(safety['Employee Number']==11026) & 
(safety['Attendance Date']=="2020-03-12") & 
(safety['Risk Rating']==0)

如果你想有条件地赋值,你需要使用.loc来定位特定的索引,然后可以赋值。

safety.loc[mask, 'Risk Rating']=9
或者你可以使用numpy select也可以应用遮罩…
safety['Risk Rating'] = np.select([mask], [9], default=safety['Risk Rating'])

改进@Bikhyat Adhiakri答案,考虑到您将处理数千行,请使用numpy代替:

import numpy as np
arr = safety.to_numpy()
# replace 0, 1, 2 with the row numbers
mask = (arr[:,0] == "Shakir Hussain") * (arr[:,1] == 11026) * (df_np[:,2] == "2020-03-12")
arr[mask,4] = 9 # but your data will be in numpy format
# or you can use
# safety.loc[mask, 'Risk Rating'] = 9

numpy可能会使大行号的处理速度提高1000倍。

见:https://stackoverflow.com/a/64504183/11671779

最新更新