我是python的新手,所以请宽恕我:(
比方说,有一个像这样的数据帧
ID B C D E isDuplicated
1 Blue Green Blue Pink false
2 Red Green Red Green false
3 Red Orange Yellow Green false
4 Blue Pink Blue Pink false
5 Blue Orange Pink Green false
6 Blue Orange Pink Green true
7 Red Orange Yellow Green true
8 Red Orange Yellow Green true
如果我在子集=B,C,D,E的行中有重复项。然后我想添加另一列"firstOccurred",它应该具有第一次出现的ID。我想要的数据帧应该是这样的:
ID B C D E isDuplicated firstOccurred
1 Blue Green Blue Pink false
2 Red Green Red Green false
3 Red Orange Yellow Green false
4 Blue Pink Blue Pink false
5 Blue Orange Pink Green false
6 Blue Orange Pink Green true 5
7 Red Orange Yellow Green true 3
8 Red Orange Yellow Green true 3
如果有任何帮助,我将不胜感激!提前谢谢!
仅将GroupBy.transform
和first
用于在numpy.where
:中传递True
的行
df['firstOccurred'] = np.where(df['isDuplicated'],
df.groupby(['B','C','D','E'])['ID'].transform('first'),
np.nan)
print (df)
ID B C D E isDuplicated firstOccurred
0 1 Blue Green Blue Pink False NaN
1 2 Red Green Red Green False NaN
2 3 Red Orange Yellow Green False NaN
3 4 Blue Pink Blue Pink False NaN
4 5 Blue Orange Pink Green False NaN
5 6 Blue Orange Pink Green True 5.0
6 7 Red Orange Yellow Green True 3.0
7 8 Red Orange Yellow Green True 3.0