如何将一列中的值传播到其他列中的行(pandas数据帧)



我有这种格式的原始数据(输入实际上是一个巨大的excel表(,由于一列中的唯一值,生成了多行,有点像下面的例子:

data = [['tom', 10, 'vanilla'], ['tom', 10, 'chocolate'], ['nick', 15, 'strawberry'], ['nick', 15, 'butter pecan'], ['nick', 15, 'mint chocolate chip'], ['juli', 14, 'strawberry'], ['juli', 14, 'chocolate'], ['juli', 14, 'vanilla']]
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Favorite Ice Cream'])

但我需要用下面的格式从上面的原始数据创建一个数据帧,其中最后一列被排列成行,重复的行被消除:

data2 = [['tom', 10, 'vanilla', 'chocolate', '', ''], ['nick', 15, 'butter pecan', '', 'strawberry', 'mint chocolate chip'], ['juli', 14, 'vanilla', 'chocolate', 'strawberry', '']]
df2 = pd.DataFrame(data2, columns = ['Name', 'Age', 'Favorite Vanilla-based Ice Cream',  'Favorite Chocolate-based Ice Cream',  'Favorite Berry Ice Cream', 'Favorite Other Ice Cream'])

谢谢你的帮助!

使用pivot和映射dict:

mapping = {'Vanilla': ['vanilla', 'butter pecan'],
'Chocolate': ['chocolate'],
'Berry': ['strawberry']}
reverse_mapping = {v: k for k, l in mapping.items() for v in l}
df['X-based'] = df['Favorite Ice Cream'].map(reverse_mapping).fillna('Other')
out = df.pivot(index=['Name', 'Age'],
columns='X-based',
values='Favorite Ice Cream') 
.rename_axis(columns=None).reset_index().fillna('')

输出:

>>> out
Name  Age       Berry  Chocolate                Other       Vanilla
0  juli   14  strawberry  chocolate                            vanilla
1  nick   15  strawberry             mint chocolate chip  butter pecan
2   tom   10              chocolate                            vanilla

最新更新