目前我有一个类似的表
ID Previous_Injuries Currently_Injured Injury_Type
1 Nan 0 Nan
1 Nan 1 Ankle
1 Nan 0 Nan
1 Nan 1 Wrist
1 Nan 0 Nan
1 Nan 1 Leg
1 Nan 0 Nan
2 Nan 1 Leg
2 Nan 0 Nan
我想添加到以前的伤病栏,并使我的表格看起来像这样:
ID Previous_Injuries Currently_Injured Injury_Type
1 Nan 0 Nan
1 Nan 1 Ankle
1 [Ankle] 0 Nan
1 [Ankle] 1 Wrist
1 [Ankle,Wrist] 0 Nan
1 [Ankle,Wrist] 1 Leg
1 [Ankle,Wrist,Leg] 0 Nan
2 Nan 1 Leg
2 [Leg] 0 Nan
我如何才能在熊猫中实现这种专栏?以列表的形式做这件事最好吗?
谢谢!
我们可以用cumsum
做shift
,然后用split
做字符串,注意这里使用的是Nan
(字符串类型(,它不是np.nan
s=df.Injury_Type.shift().fillna('Nan').add(',').cumsum().str[:-1].str.split(',')
df['new']=[[y for y in x if y != 'Nan'] for x in s ]
df
Out[322]:
ID Previous_Injuries Currently_Injured Injury_Type new
0 1 Nan 0 Nan []
1 1 Nan 1 Ankle []
2 1 Nan 0 Nan [Ankle]
3 1 Nan 1 Wrist [Ankle]
4 1 Nan 0 Nan [Ankle, Wrist]
5 1 Nan 1 Leg [Ankle, Wrist]
6 1 Nan 0 Nan [Ankle, Wrist, Leg]
再次更改问题!
l=[]
for name , dfx in df.groupby('ID'):
s = dfx.Injury_Type.shift().fillna('Nan').add(',').cumsum().str[:-1].str.split(',')
dfx['new'] = [[y for y in x if y != 'Nan'] for x in s]
l.append(dfx)
pd.concat(l)
使用:
df['Previous_Injuries']=( df['Injury_Type'].replace('Nan',np.nan).fillna(' ')
.cumsum().shift(fill_value='')
.str.split() )
print(df)
如果NaN不是str,则可以省略replace('Nan', np.nan)
ID Previous_Injuries Currently_Injured Injury_Type
0 1 [] 0 Nan
1 1 [] 1 Ankle
2 1 [Ankle] 0 Nan
3 1 [Ankle] 1 Wrist
4 1 [Ankle, Wrist] 0 Nan
5 1 [Ankle, Wrist] 1 Leg
6 1 [Ankle, Wrist, Leg] 0 Nan
使用DataFrame.groupby
进行差异ID
df['Previous_Injuries']=( df.groupby('ID')['Injury_Type']
.apply(lambda x: x.replace('Nan',np.nan).fillna(' ')
.cumsum().shift(fill_value='')
.str.split()) )
print(df)
ID Previous_Injuries Currently_Injured Injury_Type
0 1 [] 0 Nan
1 1 [] 1 Ankle
2 1 [Ankle] 0 Nan
3 1 [Ankle] 1 Wrist
4 1 [Ankle, Wrist] 0 Nan
5 1 [Ankle, Wrist] 1 Leg
6 1 [Ankle, Wrist, Leg] 0 Nan
7 2 [] 1 Leg
8 2 [Leg] 0 Nan