如何在每次迭代中覆盖旧的数据帧



如何在Spark中实现for循环,在每次迭代中覆盖旧的/原始的数据帧?类似这样的东西:

val columns = Seq("a","b")
val data = Seq((1, 102),
(2, 103),
(3, 104)
)
val df = data.toDF(columns:_*)
for( iteration <- 1 to 3) yield{
val temp = df.filter($"b" >= 100).withColumn("b", exampleUDF(lit(iteration), $"b"))
//
// other computation stuff
//
df = temp
}

也许使用vardf?

val columns = Seq("a","b")
val data = Seq((1, 102),
(2, 103),
(3, 104)
)
var df = data.toDF(columns:_*)
for(iteration <- 1 to 3) {
df = df.filter($"b" >= 100).withColumn("b", exampleUDF(lit(iteration), $"b"))
}

您可能需要使用foldLeft方法:

val finalDF = (1 to 3).foldLeft(df){(acc, iter) => 
acc.filter($"b" >= 100)
.withColumn("b", exampleUDF(lit(iter), $"b"))
}

最新更新