我想将新列追加(添加(到具有多个列的现有数据帧。
val a = Seq(
("10", "MILLER", "1300", "2017-11-03"),
("30", "Martin", "1250", "2017-11-21")).toDF("dept_no","emp_name","sal","date")
scala> a.show
+-------+--------+----+----------+
|dept_no|emp_name| sal| date|
+-------+--------+----+----------+
| 10| MILLER|1300|2017-11-03|
| 30| Martin|1250|2017-11-21|
+-------+--------+----+----------+
使用上面的数据帧,我想添加集合的每个元素(无论是常规的 Scala 集合还是另一个单列数据帧(,例如
val lst = List("10", "Susan")
如何将上面lst
元素添加到a
数据帧的行中(每行一个元素(?
让我们将lst
转换为数据帧:
val lst = List("10", "Susan").toDF
您可以使用zip
RDD
方法:
import org.apache.spark.sql.Row
val data = a.rdd.zip(lst.rdd).map { case (l, r) => Row.fromSeq(l.toSeq ++ r.toSeq) }
import org.apache.spark.sql.types.StructType
val schema = StructType(a.schema.fields ++ lst.schema.fields)
val solution = spark.createDataFrame(data, schema)
scala> solution.show
+-------+--------+----+----------+-----+
|dept_no|emp_name| sal| date|value|
+-------+--------+----+----------+-----+
| 10| MILLER|1300|2017-11-03| 10|
| 30| Martin|1250|2017-11-21|Susan|
+-------+--------+----+----------+-----+