我已经编写了UDF函数,该函数将映射[字符串,字符串]值转换为字符串:
udf("mapToString", (input: Map[String,String]) => input.mkString(","))
spark-shell
给我错误:
<console>:24: error: overloaded method value udf with alternatives:
(f: AnyRef,dataType: org.apache.spark.sql.types.DataType)org.apache.spark.sql.expressions.UserDefinedFunction <and>
...
cannot be applied to (String, Map[String,String] => String)
udf("mapToString", (input: Map[String,String]) => input.mkString(","))
是否有任何方法将映射列[字符串,字符串]值转换为字符串值?我需要此转换,因为我需要将数据框架保存为CSV文件
假设您的DataFrame
为
+---+--------------+
|id |map |
+---+--------------+
|1 |Map(200 -> DS)|
|2 |Map(300 -> CP)|
+---+--------------+
使用以下模式
root
|-- id: integer (nullable = false)
|-- map: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
您可以编写一个看起来像:
的udf
def mapToString = udf((map: collection.immutable.Map[String, String]) =>
map.mkString.replace(" -> ", ","))
并将udf
函数与withColumn
API一起使用为
df.withColumn("map", mapToString($"map"))
您应该有最终的DataFrame
,其中Map
更改为String
+---+------+
|id |map |
+---+------+
|1 |200,DS|
|2 |300,CP|
+---+------+
root
|-- id: integer (nullable = false)
|-- map: string (nullable = true)