如何在python-pyspark中使用pivot进行转换



我在excel中有以下数据,如下

col_1   col_2   col_3   00:00   01:00   02:00   03:00   04:00   06:00
0   cisc    am-01   RSR     11.00   99.72   99.75   99.75   99.83   99.82
1   cisc    am-01   CSRSR  12.00    99.99   99.99   99.99   99.99   99.99
2   cisc    am-01   DBRSR   13.00   99.97   99.95   99.95   99.95   99.98
3   cisc    am-01   DSRSR   14.99   98.45   98.71   98.79   98.83   98.72
4   cisc-2  am-02   RSR     12.00   12.34   10.00   11.22   22.34   99.74
5   cisc-2  am-02   CSRSR   12.00   23.99   99.99   99.99   99.99   99.56
6   cisc-2  am-02   DBRSR   18.00   34.99   99.95   99.95   99.95   99.98
7   cisc-2  am-02   DSRSR   98.00   67.99   98.71   98.79   98.83   98.72

以及作为的所需结果

col_1   col_2   time    RSR     CSRSR   DBRSR   DSRSR
0   cisc    am-01   00:00   11.00   12.00   13.00   14.99
1   cisc    am-01   01:00   99.72   99.99   99.97   98.45
2   cisc    am-01   02:00   99.75   99.99   99.95   98.71
3   cisc    am-01   03:00   99.75   99.99   99.95   98.79
4   cisc    am-01   04:00   99.83   99.99   99.95   98.83
5   cisc    am-01   06:00   99.82   99.99   99.98   98.72
6   cisc-2  am-02   00:00   12.00   12.00   18.00   98.00
7   cisc-2  am-02   01:00   12.34   23.99   34.99   67.99
8   cisc-2  am-02   02:00   10.00   99.99   99.95   98.71
9   cisc-2  am-02   03:00   11.22   99.99   99.95   98.79
10  cisc-2  am-02   04:00   22.34   99.99   99.95   98.83
11  cisc-2  am-02   06:00   99.74   99.56   99.98   98.72

我尝试将数据分为两部分,第一部分用于名为df的非转置列另一个用于转置df2


import pandas as pd
import numpy as np
pd.set_option('display.width', 100000)
df= pd.read_excel("./t2.xlsx")
df
df1 = df.iloc[:,[0,1]]
df1
df2 = df.iloc[:,[2,3,4,5,6,7]].set_index('col_3').T
df2

我们能连接这些数据帧吗?从而可以实现

IIUC,可以使用stack()unstack():

(df.set_index(['col_1','col_2','col_3'])
.rename_axis(columns='time')
.stack()
.unstack('col_3')
.reset_index()
)

输出:

col_3   col_1  col_2   time  CSRSR  DBRSR  DSRSR    RSR
0        cisc  am-01  00:00  12.00  13.00  14.99  11.00
1        cisc  am-01  01:00  99.99  99.97  98.45  99.72
2        cisc  am-01  02:00  99.99  99.95  98.71  99.75
3        cisc  am-01  03:00  99.99  99.95  98.79  99.75
4        cisc  am-01  04:00  99.99  99.95  98.83  99.83
5        cisc  am-01  06:00  99.99  99.98  98.72  99.82
6      cisc-2  am-02  00:00  12.00  18.00  98.00  12.00
7      cisc-2  am-02  01:00  23.99  34.99  67.99  12.34
8      cisc-2  am-02  02:00  99.99  99.95  98.71  10.00
9      cisc-2  am-02  03:00  99.99  99.95  98.79  11.22
10     cisc-2  am-02  04:00  99.99  99.95  98.83  22.34
11     cisc-2  am-02  06:00  99.56  99.98  98.72  99.74

最新更新