带数据帧:
df = pd.DataFrame({"ID": [1, 1, 1,1,2,2, 2, 2,3,3,3,3] ,"year": [2004, 2005, 2006, 2007, 2004, 2005, 2006, 2007,2004, 2005, 2006, 2007], "dummy": [0,1,1,0,1,1,0,0,0,0,0,0]})
ID year dummy
1 2004 0
1 2005 1
1 2006 1
1 2007 0
2 2004 1
2 2005 1
2 2006 0
2 2007 0
3 2004 0
3 2005 0
3 2006 0
3 2007 0
我想生成一个新列:'dummy1'从'dummy'列
ID year dummy dummy1
1 2004 0 0
1 2005 1 1
1 2006 1 0
1 2007 0 0
2 2004 1 1
2 2005 1 0
2 2006 0 0
2 2007 0 0
3 2004 0 0
3 2005 0 0
3 2006 0 0
3 2007 0 0
如果我使用:
df['dummy1'] = df["dummy"].replace({"1": "0"}, inplace=False)
它替换所有从1到0的。如何修改代码以仅替换每个ID的第二个1到0 ?
将虚值= 1和虚值移位-1 = 1合并,然后将其转换为整数类型,最后赋值给新列:
>>> df['dummy1'] = ((df['dummy'].eq(1)) & (df['dummy'].shift(-1).eq(1))).astype(int)
ID year dummy dummy1
0 1 2004 0 0
1 1 2005 1 1
2 1 2006 1 0
3 1 2007 0 0
4 2 2004 1 1
5 2 2005 1 0
6 2 2006 0 0
7 2 2007 0 0
Try withidxmax
withtransform
df['new'] = (df.index == df.groupby('ID').dummy.transform('idxmax')).astype(int)
df
ID year dummy new
0 1 2004 0 0
1 1 2005 1 1
2 1 2006 1 0
3 1 2007 0 0
4 2 2004 1 1
5 2 2005 1 0
6 2 2006 0 0
7 2 2007 0 0
不使用groupby
,使用drop_duplicates
df['dummy1'] = 0
df.loc[df[df['dummy'].eq(1)].drop_duplicates('ID').index, 'dummy1'] = 1
输出:
>>> df
ID year dummy dummy1
0 1 2004 0 0
1 1 2005 1 1
2 1 2006 1 0
3 1 2007 0 0
4 2 2004 1 1
5 2 2005 1 0
6 2 2006 0 0
7 2 2007 0 0
8 3 2004 0 0
9 3 2005 0 0
10 3 2006 0 0
11 3 2007 0 0
df['dum'] = df.groupby('ID')['dummy'].cummax()
,
conditions = [
(df['dum']== 0),
(df['dum']== 1)
]
values = [0, (df.index == df.groupby('ID').dummy.transform('idxmax')).astype(int)]
df['dummy1'] = np.select(conditions, values)
ID year dummy dummy1
1 2004 0 0
1 2005 1 1
1 2006 1 0
1 2007 0 0
2 2004 1 1
2 2005 1 0
2 2006 0 0
2 2007 0 0
3 2004 0 0
3 2005 0 0
3 2006 0 0
3 2007 0 0