Pandas数据帧将行值重塑为新列(矩阵类型格式)



我是熊猫的新手,正在寻找一些关于如何重塑我的数据帧的建议:

目前,我有一个这样的数据帧。

type_count>referr_se_count<1>1<1>1
panelst_id 类型referr_sm_countreferr_non\n_count
1 HP 2
1 PB 1 0 1 0
1 TN 3 0 3 0
2 HP10
2 PB 2 10

通过pivot_table()rename_axis()方法尝试:

out=(df.pivot_table(index='panelist_id',columns='type',fill_value=0)
.rename_axis(columns=[None,None],index=None))

最后使用map()方法和.columns属性:

out.columns=out.columns.map('_'.join)

现在,如果你打印out,你会得到你想要的输出

通过pyjanitor:的pivot_wider选项

new_df = df.pivot_wider(index='panelist_id',
names_from='type',
names_from_position='last',
fill_value=0)

new_df:

panelist_id  type_count_HP  type_count_PB  type_count_TN  refer_sm_count_HP  refer_sm_count_PB  refer_sm_count_TN  refer_se_count_HP  refer_se_count_PB  refer_se_count_TN  refer_non_n_count_HP  refer_non_n_count_PB  refer_non_n_count_TN
1              2              1              3                  2                  0                  0                  1                  1                  3                     1                     0                     0
2              1              2              0                  1                  1                  0                  0                  1                  0                     0                     0                     0

完整工作示例:

import janitor
import pandas as pd
df = pd.DataFrame({
'panelist_id': [1, 1, 1, 2, 2],
'type': ['HP', 'PB', 'TN', 'HP', 'PB'],
'type_count': [2, 1, 3, 1, 2],
'refer_sm_count': [2, 0, 0, 1, 1],
'refer_se_count': [1, 1, 3, 0, 1],
'refer_non_n_count': [1, 0, 0, 0, 0]
})
new_df = df.pivot_wider(index='panelist_id',
names_from='type',
names_from_position='last',
fill_value=0)
print(new_df.to_string(index=False))

只需再添加一个选项:

df = df.set_index(['panelist_id', 'type']).unstack(-1, ,fill_value=0)
df.columns = df.columns.map('_'.join)

使用pivot_table创建多索引

df_p = df.pivot_table(index='panelist_id', columns='type', aggfunc=sum)
refer_non_n_count           refer_se_count            
type                       HP   PB   TN             HP   PB   TN   
panelist_id                                                        
1                         1.0  0.0  0.0            1.0  1.0  3.0   
2                         0.0  0.0  NaN            0.0  1.0  NaN   
refer_sm_count           type_count            
type                    HP   PB   TN         HP   PB   TN  
panelist_id                                                
1                      2.0  0.0  0.0        2.0  1.0  3.0  
2                      1.0  1.0  NaN        1.0  2.0  NaN 

如果你确实想压平你的列,那么

df_p.columns = ['_'.join(col) for col in df_p.columns.values]

首先,导入libs:

import numpy as np
import pandas as pd

然后,读取您的数据:

data = pd.read_excel('base.xlsx')

使用pivot_table:重塑数据

data_reshaped = pd.pivot_table(data, values=['type_count', 'refer_sm_count', 'refer_se_count', 'refer_non_n_count'],
index=['panelist_id'], columns=['type'], aggfunc=np.sum)

但是,你的指数不会很好。所以,重置然后:

columns = [data_reshaped.columns[i][0] + '_' + data_reshaped.columns[i][1]
for i in range(len(data_reshaped.columns))] # to create new columns names
data_reshaped.columns = columns # to assign new columns names to dataframe
data_reshaped.reset_index(inplace=True) # to reset index
data_reshaped.fillna(0, inplace=True) # to substitute nan to 0

然后,你的数据将像好的

最新更新