假设我的输入df如下所示:
<表类>
时间戳
名称
价值
tbody><<tr>14:00:00 25 14:00:00 td> 24 15:00:00 20 15:00:00 C 21 16:00:00 20 16:00:00 td> 22 16:00:00 C 23 16:00:00 D 24 表类>
你所需要的是在pyspark中pivot
,你可以使用pivot
来实现像下面这样的旋转
df = spark.createDataFrame(
[('14:00:00','A',25),
('14:00:00','B',24),
('15:00:00','A',20),
('15:00:00','C',21),
('16:00:00','A',20),
('16:00:00','B',22),
('16:00:00','C',23),
('16:00:00','D',24)],("Timestamp", "name", "value"))
df1 = df.groupBy("Timestamp").pivot("name").sum("value")
df1.show() # this should display the expected results