在熊猫数据帧中枢转/转置某些列



我目前正在处理大型。csv文件,其中日期分配给列标题,重复指标类型和客户端ID。

这是一个类似于我正在使用的数据的小样本…

data = {'Client Id': {0: 123987, 1: 123987, 2: 123987, 3: 9871234, 4: 9871234, 5: 9871234}, 
'Metrics': {0: 'Spend', 1: 'Views', 2: 'Clicks', 3: 'Spend', 4: 'Views', 5: 'Clicks'},
'3/1/22': {0: '$0.00', 1: '6', 2: '1', 3: '$20.00', 4: '3', 5: '1'},
'3/2/22': {0: '$0.00', 1: '3', 2: '2', 3: '$10.00', 4: '2', 5: '2'},
'3/3/22': {0: '$0.00', 1: '3', 2: '3', 3: '$3.00', 4: '5', 5: '4'}}
df = pd.DataFrame(data)

当前,我的数据框架看起来像这样…

<表类>客户机Id指标3/1/223/2/223/3/22tbody><<tr>123987花0.00美元0.00美元0.00美元123987视图633123987点击1239871234花20.00美元10.00美元3.00美元9871234视图3259871234点击124

首先熔化日期列,然后pivot:

new_df = (
df
.melt(id_vars=['Client Id', 'Metrics'], var_name='Date')
.pivot(columns='Metrics', index=['Client Id', 'Date'], values='value')
.reset_index()
.rename_axis(None, axis=1)
)

输出:

>>> new_df
Client Id    Date Clicks   Spend Views
0     123987  3/1/22      1   $0.00     6
1     123987  3/2/22      2   $0.00     3
2     123987  3/3/22      3   $0.00     3
3    9871234  3/1/22      1  $20.00     3
4    9871234  3/2/22      2  $10.00     2
5    9871234  3/3/22      4   $3.00     5

您可以使用:

(df.pivot(index='Client Id', columns='Metrics')
.stack(0)
.reset_index()
.rename({'level_1': 'Date'})
.rename_axis(None, axis=1)
)

输出:

Client Id level_1 Clicks   Spend Views
0     123987  3/1/22      1   $0.00     6
1     123987  3/2/22      2   $0.00     3
2     123987  3/3/22      3   $0.00     3
3    9871234  3/1/22      1  $20.00     3
4    9871234  3/2/22      2  $10.00     2
5    9871234  3/3/22      4   $3.00     5

最新更新