给定DataFrame
,例如
val df = sc.parallelize(Seq((1L, 0.1), (2L, 0.2), (3L, 0.3))).toDF("k","v")
df.show
+---+---+
| k| v|
+---+---+
| 1|0.1|
| 2|0.2|
| 3|0.3|
+---+---+
如何将每一行相加为一个新列,命名为totals
,以便dfTotals.show
+---+---+--------+
| k| v| totals|
+---+---+--------+
| 1|0.1| 1.1|
| 2|0.2| 2.2|
| 3|0.3| 3.3|
+---+---+--------+
找到了一个比最初想象的更简单的解决方案,
val totals = ($"k" + $"v")
val dfTotals = df.withColumn("totals", totals)
因此
dfTotals.show
+---+---+------+
| k| v|totals|
+---+---+------+
| 1|0.1| 1.1|
| 2|0.2| 2.2|
| 3|0.3| 3.3|
+---+---+------+
更新:另一种方法,虽然不那么整洁,
df.select(df("k"), df("v"), df("k")+df("v"))