如何从 scala/pyspark 数据帧中的列表制作列?错误:不支持该功能:"列表()"的文字



enter code here我正在练习将列表添加到数据帧列中。我可以定义udf并注册,然后在数据帧上应用,但我想尝试一种不同的方法,即在新列中提取list from dataframe col和它们map it,然后提取readd to the original dataframe

val df = spark.createDataFrame(Seq(("A",1),("B",2),("C",3))).toDF("Str", "Num")
+---+---+
|Str|Num|
+---+---+
|  A|  1|
|  B|  2|
|  C|  3|
+---+---+

收集的列表:

scala> var ls : List[String] = df.select("Str").collect().map(f=>f.getString(0)).toList
var ls: List[String] = List(A, B, C, d)

转换:

def f(x : String) : String = {
if (x=="A") {x + "100"}
else {x + x.length.toString}
}

应用转换:

scala> ls.map(x => f(x))
val res95: List[String] = List(A100, B1, C1, d1)

从列表中添加列:错误

import org.apache.spark.sql.functions.{lit,col}
df.withColumn("new", lit(ls)).show()
error: org.apache.spark.SparkRuntimeException: The feature is not supported: literal for 'List(A100, B1, C1)' of class scala.collection.immutable.$colon$colon. 
//Please correct here

创建udf:

val myUdf = udf { x: String =>
if (x=="A") {x + "100"}
else {x + x.length.toString}
}

以及应用于df:

df.withColumn("new", myUdf(col("Str")))

从列表中添加新列:

df.withColumn("fromListColumn", array(Seq("one", "two").map(lit(_)):_*))

最新更新