我需要将下面的两列数据框转为一行(从长到宽)。
+--------+-----+
| udate| cc|
+--------+-----+
|20090622| 458|
|20090624|31068|
|20090626| 151|
|20090629| 148|
|20090914| 453|
+--------+-----+
我需要这样的格式:
+--------+------------+----------+----------+
| udate| 20090622 | 20090624 | 20090626 |
+--------+------------+----------+----------+
| cc | 458| 31068 | 151 |etc
我运行了这个:
result_df.groupBy($"udate").pivot("udate").agg(max($"cc")).show()
,但最终得到一个所有行转置到所有列的矩阵:
+--------+--------+--------+--------+--------+--------+---
| udate|20090622|20090624|20090626|20090629|20090703|200
+--------+--------+--------+--------+--------+--------+---
|20090622| 458| null| null| null| null|
|20090624| null| 31068| null| null| null|
|20090626| null| null| 151| null| null|
|20090629| null| null| null| 148| null|
|20090703| null| null| null| null| 362|
|20090704| null| null| null| null| null|
|20090715| null| null| null| null| null|
|20090718| null| null| null| null| null|
|20090721| null| null| null| null| null|
|20090722| null| null| null| null| null|
我期望对一列数据集进行轴转应该得到一行轴转数据集。
如何修改pivot命令,以便将结果集pivot到一行?
tl;dr在Spark 2.4.0中,它简单地归结为单独使用groupBy
。
val solution = d.groupBy().pivot("udate").agg(first("cc"))
scala> solution.show
+--------+--------+--------+--------+--------+
|20090622|20090624|20090626|20090629|20090914|
+--------+--------+--------+--------+--------+
| 458| 31068| 151| 148| 453|
+--------+--------+--------+--------+--------+
如果您确实需要包含名称的第一列,只需使用withColumn
就可以了。
val betterSolution = solution.select(lit("cc") as "udate", $"*")
scala> betterSolution.show
+-----+--------+--------+--------+--------+--------+
|udate|20090622|20090624|20090626|20090629|20090914|
+-----+--------+--------+--------+--------+--------+
| cc| 458| 31068| 151| 148| 453|
+-----+--------+--------+--------+--------+--------+