根据连续的行创建Source和Target列



我有以下问题

1001人完成活动A,然后完成活动C(活动A之后)我需要将连续的行移动到目标列

df = pd.DataFrame([[1001, 'A'], [1001,'C'], [1004, 'D'],[1005, 'C'], 
[1005,'D'], [1010, 'A'],[1010,'D'],[1010,'F']], columns=['CustomerNr','Activity'])
df = pd.DataFrame([[1001, 'A','C'], [1004, 'D',np.nan],[1005, 'C','D'], 
[1010, 'A','D'],[1010,'D' ,'F']], columns=['CustomerNr','Target','Source'])
DD

可以使用:

df['Target']=df['Activity'].shift(-1)
df['prev_CustomerNr']=df['CustomerNr'].shift(-1)
print(df)
'''
CustomerNr Activity Target  prev_CustomerNr
0        1001        A      C           1001.0
1        1001        C      D           1004.0
2        1004        D      C           1005.0
3        1005        C      D           1005.0
4        1005        D      A           1010.0
5        1010        A      D           1010.0
6        1010        D      F           1010.0
7        1010        F   None              NaN
'''
#we can't find the target information of the most recent activity. So we drop the last row for each CustomerNr.
m1 = df.duplicated(['CustomerNr'], keep="last") #https://stackoverflow.com/a/70216388/15415267
m2 = ~df.duplicated(['CustomerNr'], keep=False)
df = df[m1|m2]
#If CustomerNr and prev_CustomerNr are not the same, I replace with nan.
df['Target']=np.where(df['CustomerNr']==df['prev_CustomerNr'],df['Target'],np.nan)
df=df.drop(['prev_CustomerNr'],axis=1)
print(df)
'''
CustomerNr Activity Target
0        1001        A      C
2        1004        D    NaN
3        1005        C      D
5        1010        A      D
6        1010        D      F
'''

相关内容

  • 没有找到相关文章

最新更新