如何将一列枢轴变为多个列,而该列的其余部分成为该枢轴列下的嵌套列

  • 本文关键字:余部 嵌套 一列 python pandas dataframe
  • 更新时间 :
  • 英文 :


我有一个这样的数据:

year     energy_products  consumption_ktoe    value_ktoe
0   2009       Coal and Peat               3.0      3.300000
1   2009           Crude Oil               0.0  49079.900000
2   2009         Electricity            3338.1   3594.203691
3   2009         Natural Gas             867.8   6656.700000
4   2009              Others               0.0      0.000000
..  ...            .......           .......       .........

我想将energy_product枢轴到列中,并将consumption_ktoe和value_ktoe作为每个枢轴列下的子列。

所以我想要的输出是这样的:

energy_products  Coal and Peat                 Crude Oil                     
year          consumption_ktoe  value_ktoe  consumption_ktoe  value_ktoe
0   2009                       3.0    3.300000                 3           4

energy_products   Electricity                   Natural Gas                   
year          consumption_ktoe  value_ktoe  consumption_ktoe  value_ktoe  
0   2009                       3.0    3.300000                 3           4
energy_products   Others
year          consumption_ktoe  value_ktoe  
0   2009                       3.0    3.300000 

旋转之后,我得到了这个:

finalConImportMerge = finalConImportMerge.pivot(index='year', columns=['energy_products'])
consumption_ktoe                                           
energy_products    Coal and Peat Crude Oil Electricity Natural Gas Others   
year                                                                        
2009                         3.0       0.0      3338.1       867.8    0.0   
2010                         5.9       0.0      3633.1      1128.3    0.0   
2011                         5.7       0.0      3697.9      1184.0    0.0   
2012                        23.0       0.0      3800.4      1126.2    0.0   
2013                       129.0       0.0      3864.8      1311.6    0.0   
2014                       162.2       0.0      3989.9      1355.1    0.0   
2015                       162.9       0.0      4085.5      1340.5    0.0   
2016                       166.8       0.0      4180.9      1360.3    0.0   
2017                       600.5       0.0      4268.1      1438.4    0.0   
2018                       182.5       0.0      4338.8      1479.4    0.0   
2019                       180.9       0.0      4448.0      1482.4    0.0   
2020                       182.7       0.0      4366.2      1638.2    0.0   
value_ktoe                         
energy_products Petroleum Products Coal and Peat Crude Oil  Electricity   
year                                                                      
2009                        6474.8           3.3   49079.9  3594.203691   
2010                        7790.2           6.0   46074.3  3900.815819   
2011                        7614.0           6.1   47087.8  3955.235409   
2012                        7946.7          25.0   52191.4  4035.785689   
2013                        8475.0         266.4   48759.6  4124.117566   
2014                        8968.3         396.2   49155.7  4239.865600   
2015                        9993.3         406.1   50008.5  4322.578390   
2016                        9351.2         426.6   53410.0  4435.648008   
2017                        9149.6         898.6   58554.5  4490.609300   
2018                        9073.5         477.9   57180.3  4548.992776   
2019                       10125.2         458.0   54399.7  4655.398594   
2020                        8713.4         433.2   47047.3  4563.333718   

我试着交换他们的水平

finalConImportMerge = finalConImportMerge.swaplevel(axis=1)

,但最终是这样的:

energy_products    Coal and Peat        Crude Oil ...... Coal and Peat   Crude Oil .......
year            consumption_ktoe  consumption_ktoe  ......   value_ktoe  value_ktoe .....
2009                           3                 3                    3           3

然后我想到了合并顶层列,但是如果我使用groupby,我必须对它求和,并且消费col和值col将被求和,这不是我想要的。

finalConImportMerge = finalConImportMerge.groupby(level=[0], axis=1)

我以这个结尾:

energy_products Coal and Peat   Crude Oil   Electricity Natural Gas Others  
year                        
2009                      6.3     49079.9   6932.303691      7524.5    0.0  
....                     ....         ....              ....            ...         ..
energy_products     Petroleum Products
year                        
2009                           96774.9 
....                             ....

有什么方法可以达到我想要的结果吗?或者把列分组而不求和?

你差一点就成功了。这似乎有点俗气,但我不知道除此之外还有别的方法。(我使用pandas pivot文档中的示例,因为您没有提供代码来重新创建示例DataFrame。)

import pandas as pd
df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
'two'],
'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
pivoted_df = df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
goal_df = pivoted_df.swaplevel(axis=1)[sorted(pivoted_df.swaplevel(axis=1))]

goal_df看起来像这样

bar A       B       C
baz zoo baz zoo baz zoo
foo                     
one 1   x   2   y   3   z
two 4   q   5   w   6   t

如何pivot,reset, unstack, swap,reset, drop,set_index, Transpose和reset

finalConImportMerge.pivot(index=['year'], columns=['energy_products']).reset_index().unstack().swaplevel(1,0).reset_index().drop('level_2',axis=1).set_index('energy_products').T.reset_index(drop=True)
Out[47]: 
energy_products             Coal_and_Peat         Crude_Oil       Electricity       Natural_Gas            Others Coal_and_Peat   Crude_Oil  Electricity Natural_Gas      Others
0                  year  consumption_ktoe  consumption_ktoe  consumption_ktoe  consumption_ktoe  consumption_ktoe    value_ktoe  value_ktoe   value_ktoe  value_ktoe  value_ktoe
1                2009.0               3.0               0.0            3338.1             867.8               0.0           3.3     49079.9  3594.203691      6656.7         0.0

最新更新