将2D列表保存到Scala Spark中



我有以下格式的2D列表,名称为tuppleslides:

List(List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7), List(10,4,2,4,5,2,6,2,5,7))

我创建了以下模式:

val schema = StructType(
            Array(
            StructField("1", IntegerType, true), 
            StructField("2", IntegerType, true), 
            StructField("3", IntegerType, true), 
            StructField("4", IntegerType, true),  
            StructField("5", IntegerType, true), 
            StructField("6", IntegerType, true), 
            StructField("7", IntegerType, true), 
            StructField("8", IntegerType, true), 
            StructField("9", IntegerType, true), 
            StructField("10", IntegerType, true) )
        )

我正在创建一个类似的数据框架:

val tuppleSlidesDF = sparkSession.createDataFrame(tuppleSlides, schema)

,但它甚至都不会编译。我应该如何正确地做?

谢谢。

您需要在创建数据框架之前将2D列表转换为 rdd [row] 对象:

import org.apache.spark.sql._
import org.apache.spark.sql.types._
val rdd = sc.parallelize(tupleSlides).map(Row.fromSeq(_))
sqlContext.createDataFrame(rdd, schema)
# res7: org.apache.spark.sql.DataFrame = [1: int, 2: int, 3: int, 4: int, 5: int, 6: int, 7: int, 8: int, 9: int, 10: int]

也要在Spark 2.x, sqlcontext 中注明 spark

spark.createDataFrame(rdd, schema)
# res1: org.apache.spark.sql.DataFrame = [1: int, 2: int ... 8 more fields]

相关内容

  • 没有找到相关文章

最新更新