如何将RDD地图转换为数据框架



我有映射的rdd,我想将其转换为dataframe这是RDD的输入格式

val mapRDD: RDD[Map[String, String]] = sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))

是否有任何方法可以转换为

的数据框架
 val df=mapRDD.toDf

df.show

empid,  empName,    depId
12      Rohan       201
13      Ross        201
14      Richard     401
15      Michale     501
16      John        701

您可以轻松地将其转换为Spark DataFrame:

这是一个可以解决问题的代码:

val mapRDD= sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))
val columns=mapRDD.take(1).flatMap(a=>a.keys)
val resultantDF=mapRDD.map{value=>
      val list=value.values.toList
      (list(0),list(1),list(2))
      }.toDF(columns:_*)
resultantDF.show()

输出为:

+-----+-------+-----+
|empid|empName|depId|
+-----+-------+-----+
|   12|  Rohan|  201|
|   13|   Ross|  201|
|   14|Richard|  401|
|   15|Michale|  501|
|   16|   John|  701|
+-----+-------+-----+

相关内容

  • 没有找到相关文章

最新更新