使用panda可以将具有匹配索引的多行动态转换为多列



我需要从中转换以下数据帧:

class_id     instructor_id
1                10
2                10
2                20
3                30
3                40
3                50

到此:

class_id   instructor_id   instructor_id_2   instructor_id_3
1             10
2             10                   20
3             30                   40                       50

唯一的instructor_id列的数量将基于与每个class_id相关联的instructor _id数量来动态确定。instructor_id列名将延续相同的instructor_id_x模式。

使用groupbyapply+listapply+pd.Series作为:

df1 = df.groupby('class_id')['instructor_id'].apply(list).apply(pd.Series)
# alternative df.groupby('class_id')['instructor_id'].apply(lambda x: pd.Series(x.tolist())).unstack()
df1.columns = ['instructor_id']+['instructor_id_'+str(i+1) for i in df1.columns[1:]]
df1.reset_index(inplace=True)
print(df1)
class_id  instructor_id  instructor_id_2  instructor_id_3
0         1           10.0              NaN              NaN
1         2           10.0             20.0              NaN
2         3           30.0             40.0             50.0

groupby+cumcount+unstack

以下是使用key助手系列的一种方法:

key = df.groupby('class_id')['instructor_id'].cumcount()
.add(1).map('Instructor_{}'.format)
res = df.set_index(['class_id', key]).unstack().reset_index()
# clean up column names
res.columns = res.columns.droplevel(0)
res = res.rename(columns={'': 'class_id'})
print(res)
class_id  Instructor_1  Instructor_2  Instructor_3
0         1          10.0           NaN           NaN
1         2          10.0          20.0           NaN
2         3          30.0          40.0          50.0

最新更新