我有这个样本数据框架,常规数据框架有大约35个不同的列,有更多的类别/子类别
id drink_bottle drink_can food_napkin food_wrapper location
1 5 5 2 6 mall
2 2 4 3 1 mall
3 7 2 0 1 mall
4 0 3 9 2 urban
5 3 1 4 4 urban
6 2 4 5 3 urban
7 2 1 4 3 urban
8 7 1 2 0 urban
9 0 2 3 9 urban
10 3 4 4 4 recreational
11 2 3 5 5 recreational
12 5 1 0 3 recreational
我想把它变成这个
id category subcategory amount location
1 drink bottle 5 mall
1 drink can 5 mall
1 food napkin 2 mall
1 food wrapper 6 mall
2 drink bottle 2 mall
2 drink can 4 mall
2 food napkin 3 mall
2 food wrapper 1 mall
3 drink bottle 3 mall
3 drink can 4 mall
3 food napkin 5 mall
3 food wrapper 7 mall
4 drink bottle 1 urban
4 drink can 4 urban
4 food napkin 2 urban
4 food wrapper 1 urban
我尝试过不同的东西,比如转置函数和使用字符串的一部分来添加值。但我真的不知道,解决这样一个问题的最好方法是什么。因为我不知道如何重新安排它,它也需要可扩展的情况下,新的列被添加。有人能给我指点一下方向吗?
d={'drink_bottle': {1: 5, 2: 2, 3: 7, 4: 0},
'drink_can': {1: 5, 2: 4, 3: 2, 4: 3},
'food_napkin': {1: 2, 2: 3, 3: 0, 4: 9},
'food_wrapper': {1: 6, 2: 1, 3: 1, 4: 2},
'location': {1: 'mall', 2: 'mall', 3: 'mall', 4: 'urban'},
'id': {1: 1, 2: 2, 3: 3, 4: 4}}
df = pd.DataFrame(data=d)
您可以将带有_
的非列名称转换为通过DataFrame.set_index
索引,然后通过Series.str.split
拆分列并通过DataFrame.stack
按两个级别重塑:
df1 = df.set_index(['id','location'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = (df1.rename_axis(['category', 'subcategory'], axis=1)
.stack([0,1])
.reset_index(name='amount'))
print (df1)
id location category subcategory amount
0 1 mall drink bottle 5.0
1 1 mall drink can 5.0
2 1 mall food napkin 2.0
3 1 mall food wrapper 6.0
4 2 mall drink bottle 2.0
5 2 mall drink can 4.0
6 2 mall food napkin 3.0
7 2 mall food wrapper 1.0
8 3 mall drink bottle 7.0
9 3 mall drink can 2.0
10 3 mall food napkin 0.0
11 3 mall food wrapper 1.0
12 4 urban drink bottle 0.0
13 4 urban drink can 3.0
14 4 urban food napkin 9.0
15 4 urban food wrapper 2.0