我需要从中转换以下数据帧:
class_id instructor_id
1 10
2 10
2 20
3 30
3 40
3 50
到此:
class_id instructor_id instructor_id_2 instructor_id_3
1 10
2 10 20
3 30 40 50
唯一的instructor_id列的数量将基于与每个class_id相关联的instructor _id数量来动态确定。instructor_id列名将延续相同的instructor_id_x模式。
使用groupby
apply
+list
和apply
+pd.Series
作为:
df1 = df.groupby('class_id')['instructor_id'].apply(list).apply(pd.Series)
# alternative df.groupby('class_id')['instructor_id'].apply(lambda x: pd.Series(x.tolist())).unstack()
df1.columns = ['instructor_id']+['instructor_id_'+str(i+1) for i in df1.columns[1:]]
df1.reset_index(inplace=True)
print(df1)
class_id instructor_id instructor_id_2 instructor_id_3
0 1 10.0 NaN NaN
1 2 10.0 20.0 NaN
2 3 30.0 40.0 50.0
groupby
+cumcount
+unstack
以下是使用key
助手系列的一种方法:
key = df.groupby('class_id')['instructor_id'].cumcount()
.add(1).map('Instructor_{}'.format)
res = df.set_index(['class_id', key]).unstack().reset_index()
# clean up column names
res.columns = res.columns.droplevel(0)
res = res.rename(columns={'': 'class_id'})
print(res)
class_id Instructor_1 Instructor_2 Instructor_3
0 1 10.0 NaN NaN
1 2 10.0 20.0 NaN
2 3 30.0 40.0 50.0