我有一个具有多个组合的数据帧,它们各自的排名如下所示:
+--------------+--------------+--------------+------+
| Combination1 | Combination2 | Combination3 | Rank |
+--------------+--------------+--------------+------+
| VAR1 : VAL11 | VAR2 : VAL21 | VAR3 : VAL31 | 1 |
| VAR1 : VAL12 | VAR4 : VAL41 | VAR2 : VAL22 | 2 |
+--------------+--------------+--------------+------+
我已经将每一列分为两列,如下所示:
+-----------+-----------+-----------+-----------+-----------+-----------+-------+
| Comb_Col1 | Comb_Val1 | Comb_Col2 | Comb_Val2 | Comb_Col3 | Comb_Val3 | Rank |
+-----------+-----------+-----------+-----------+-----------+-----------+-------+
| VAR1 | VAL11 | VAR2 | VAL21 | VAR3 | VAL31 | 1 |
| VAR1 | VAL12 | VAR4 | VAL41 | VAR2 | VAL22 | 2 |
+-----------+-----------+-----------+-----------+-----------+-----------+-------+
我试图实现的是(在支点(:
+------+-------+-------+-------+-------+
| Rank | VAR1 | VAR2 | VAR3 | VAR4 |
+------+-------+-------+-------+-------+
| 1 | VAL11 | VAL21 | VAL31 | NaN |
| 2 | VAL12 | VAL22 | NaN | VAL41 |
+------+-------+-------+-------+-------+
有多个变量,每个变量都有多个值。我正在尝试的是在列中标记所有变量的名称,并在行中分别标记它们的值。如果没有,请将其留空或为Null
我尝试使用以下代码(但无法实现所需的输出(:
(df.set_index('Rank')
.groupby(level='Rank')
.apply(lambda g: g.apply(pd.value_counts))
.unstack(level=1)
.fillna(0))
(df.set_index('Rank').stack()
.groupby(level=[0,1])
.value_counts()
.unstack(level=[1,2])
.fillna(0)
.sort_index(axis=1))
df_out = pd.wide_to_long(df, ['Comb_Col', 'Comb_Val'], 'Rank', 'No').reset_index()
.pivot('Rank', 'Comb_Col', 'Comb_Val')
如有任何帮助,我们将不胜感激。
提前感谢!
我尝试将列和值与秩一起分离,重命名它们,然后合并。
+-----------+-----------+-------+ +-----------+-----------+-------+
| Comb_Col1 | Comb_Val1 | Rank | | Comb_Col1 | Comb_Val1 | Rank |
+-----------+-----------+-------+ +-----------+-----------+-------+
| VAR1 | VAL11 | 1 | | VAR3 | VAL31 | 1 |
| VAR1 | VAL12 | 2 | | VAR2 | VAL22 | 2 |
+-----------+-----------+-------+ +-----------+-----------+-------+
+-----------+-----------+-------+
| Comb_Col1 | Comb_Val1 | Rank |
+-----------+-----------+-------+
| VAR2 | VAL21 | 1 |
| VAR4 | VAL41 | 2 |
+-----------+-----------+-------+
之后,简单的透视查询就起作用了!
df.pivot(index = Rank,
columns ='Comb_Col1',
values = ['Comb_Val1']
)