转置数据和拆分列



我有这个样本数据框架,常规数据框架有大约35个不同的列,有更多的类别/子类别

id  drink_bottle  drink_can  food_napkin  food_wrapper      location

1              5          5            2             6          mall
2              2          4            3             1          mall
3              7          2            0             1          mall
4              0          3            9             2         urban
5              3          1            4             4         urban
6              2          4            5             3         urban
7              2          1            4             3         urban
8              7          1            2             0         urban
9              0          2            3             9         urban
10             3          4            4             4  recreational
11             2          3            5             5  recreational
12             5          1            0             3  recreational

我想把它变成这个

id category subcategory  amount location

1     drink      bottle       5     mall
1     drink         can       5     mall
1      food      napkin       2     mall
1      food     wrapper       6     mall
2     drink      bottle       2     mall
2     drink         can       4     mall
2      food      napkin       3     mall
2      food     wrapper       1     mall
3     drink      bottle       3     mall
3     drink         can       4     mall
3      food      napkin       5     mall
3      food     wrapper       7     mall
4     drink      bottle       1    urban
4     drink         can       4    urban
4      food      napkin       2    urban
4      food     wrapper       1    urban

我尝试过不同的东西,比如转置函数和使用字符串的一部分来添加值。但我真的不知道,解决这样一个问题的最好方法是什么。因为我不知道如何重新安排它,它也需要可扩展的情况下,新的列被添加。有人能给我指点一下方向吗?

d={'drink_bottle': {1: 5, 2: 2, 3: 7, 4: 0},
'drink_can': {1: 5, 2: 4, 3: 2, 4: 3},
'food_napkin': {1: 2, 2: 3, 3: 0, 4: 9},
'food_wrapper': {1: 6, 2: 1, 3: 1, 4: 2},
'location': {1: 'mall', 2: 'mall', 3: 'mall', 4: 'urban'},
'id': {1: 1, 2: 2, 3: 3, 4: 4}}
df = pd.DataFrame(data=d)

您可以将带有_的非列名称转换为通过DataFrame.set_index索引,然后通过Series.str.split拆分列并通过DataFrame.stack按两个级别重塑:

df1 = df.set_index(['id','location'])
df1.columns = df1.columns.str.split('_', expand=True)
df1 = (df1.rename_axis(['category', 'subcategory'], axis=1)
.stack([0,1])
.reset_index(name='amount'))
print (df1)
id location category subcategory  amount
0    1     mall    drink      bottle     5.0
1    1     mall    drink         can     5.0
2    1     mall     food      napkin     2.0
3    1     mall     food     wrapper     6.0
4    2     mall    drink      bottle     2.0
5    2     mall    drink         can     4.0
6    2     mall     food      napkin     3.0
7    2     mall     food     wrapper     1.0
8    3     mall    drink      bottle     7.0
9    3     mall    drink         can     2.0
10   3     mall     food      napkin     0.0
11   3     mall     food     wrapper     1.0
12   4    urban    drink      bottle     0.0
13   4    urban    drink         can     3.0
14   4    urban     food      napkin     9.0
15   4    urban     food     wrapper     2.0

相关内容

  • 没有找到相关文章

最新更新