如何解栈一个数据帧(pandas/python)的一些值?



我有这个数据框架:

Type             cluster       level          value
0   Accomodation    0-1 € pr.increase   from_price      0.926047
1   Accomodation    0-1 € pr.increase   from_vol        -0.367787
2   Accomodation    0-1 € pr.increase   RIC_from_Vol    561655.141824
3   Accomodation    0-1 € pr.increase   RIC_from_Price  96439.028176
4   Accomodation    1-2 € pr.increase   from_price      1.687742
5   Accomodation    1-2 € pr.increase   from_vol        -0.264432
6   Accomodation    1-2 € pr.increase   RIC_from_Vol    248475.517577
7   Accomodation    1-2 € pr.increase   RIC_from_Price  68894.222423
...

我想获得以下内容:

Type              cluster     level       value            RIC
0   Accomodation    0-1 € pr.increase   from_price  0.926047        96439.028176
1   Accomodation    0-1 € pr.increase   from_vol    -0.367787       561655.141824
4   Accomodation    1-2 € pr.increase   from_price  1.687742        68894.222423
5   Accomodation    1-2 € pr.increase   from_vol    -0.264432       248475.517577
...

即:取出level = RIC_from_Vol,并将其值放在名为RIC的新列中,位于level = from_vol的同一行。我该怎么做呢?

我已经尝试了unstack,但只有错误…

您需要在透视前创建临时列

grp = ['Type','cluster','level']
condition = df.level.str.startswith('from')
(df
.assign(
header = np.where(condition, 'value', 'RIC'), 
level = np.where(condition, df.level, df.level.str.removeprefix('RIC_'))
)
.assign(
level = lambda df: df.level.str.lower(), 
uniques = lambda df: df.groupby(grp + ['header']).cumcount()
)
.pivot(index=['uniques'] + grp, columns='header',values='value')
.rename_axis(columns=None)
.droplevel('uniques')
)
RIC     value
Type         cluster           level
Accomodation 0-1 € pr.increase from_price   96439.028176  0.926047
from_vol    561655.141824 -0.367787
1-2 € pr.increase from_price   68894.222423  1.687742
from_vol    248475.517577 -0.264432

最新更新