我想从数据帧创建变量,需要在spark scala代码中使用(我想每行1乘1,每次都在变量中使用列值,有人能帮忙吗?这是y数据帧:
+---+--------------------+------------------------------------------------------------------+---------------------------+-------------------------------------------------------------------------+----------+
|id |table1_name |table_1_path |table2_name |table_2_path |key_column|
+---+--------------------+------------------------------------------------------------------+---------------------------+-------------------------------------------------------------------------+----------+
|1 |orders-201019-002101|C:/Users/USER/Desktop/Notes/datset/week11/orders-201019-002101.csv|orders-201019-002101 - Copy|C:/Users/USER/Desktop/Notes/datset/week11/orders-201019-002101 - Copy.csv|order_id |
|2 |orders-201019-002101|C:/Users/USER/Desktop/Notes/datset/week11/orders-201019-002101.csv|orders-201019-002101 - Copy|C:/Users/USER/Desktop/Notes/datset/week11/orders-201019-002101 - Copy.csv|order_id |
+---+--------------------+------------------------------------------------------------------+---------------------------+-------------------------------------------------------------------------+----------+
我尝试过使用列表,但在scala 中似乎很困难
您可以将DF转换为列表,然后使用map将方案应用于列表。通过这种方式,您可以将df数据作为变量进行访问。
示例:
case class schemeExemple(
valueOne: Int,
valueTwo: Int
)
val values = Seq((0,1),(0,1),(10,1),(2,1),(4,1))
val df = spark.createDataFrame(values)
val dfList = df.collect().toList.map(x => schemeExemple(x.getInt(0), x.getInt(1)))
dfList.foreach(x => {
println(s"Print value one -> ${x.valueOne}")
println(s"Print value Two -> ${x.valueTwo}")
println("-----------")
})
输出:
Print value one -> 0
Print value Two -> 1
-----------
Print value one -> 0
Print value Two -> 1
-----------
Print value one -> 10
Print value Two -> 1
-----------
Print value one -> 2
Print value Two -> 1
-----------
Print value one -> 4
Print value Two -> 1
-----------