enter code here
我正在练习将列表添加到数据帧列中。我可以定义udf并注册,然后在数据帧上应用,但我想尝试一种不同的方法,即在新列中提取list from dataframe col
和它们map it
,然后提取readd to the original dataframe
。
val df = spark.createDataFrame(Seq(("A",1),("B",2),("C",3))).toDF("Str", "Num")
+---+---+
|Str|Num|
+---+---+
| A| 1|
| B| 2|
| C| 3|
+---+---+
收集的列表:
scala> var ls : List[String] = df.select("Str").collect().map(f=>f.getString(0)).toList
var ls: List[String] = List(A, B, C, d)
转换:
def f(x : String) : String = {
if (x=="A") {x + "100"}
else {x + x.length.toString}
}
应用转换:
scala> ls.map(x => f(x))
val res95: List[String] = List(A100, B1, C1, d1)
从列表中添加列:错误
import org.apache.spark.sql.functions.{lit,col}
df.withColumn("new", lit(ls)).show()
error: org.apache.spark.SparkRuntimeException: The feature is not supported: literal for 'List(A100, B1, C1)' of class scala.collection.immutable.$colon$colon.
//Please correct here
创建udf:
val myUdf = udf { x: String =>
if (x=="A") {x + "100"}
else {x + x.length.toString}
}
以及应用于df:
df.withColumn("new", myUdf(col("Str")))
从列表中添加新列:
df.withColumn("fromListColumn", array(Seq("one", "two").map(lit(_)):_*))