如何在Spark中实现for循环,在每次迭代中覆盖旧的/原始的数据帧?类似这样的东西:
val columns = Seq("a","b")
val data = Seq((1, 102),
(2, 103),
(3, 104)
)
val df = data.toDF(columns:_*)
for( iteration <- 1 to 3) yield{
val temp = df.filter($"b" >= 100).withColumn("b", exampleUDF(lit(iteration), $"b"))
//
// other computation stuff
//
df = temp
}
也许使用var
df?
val columns = Seq("a","b")
val data = Seq((1, 102),
(2, 103),
(3, 104)
)
var df = data.toDF(columns:_*)
for(iteration <- 1 to 3) {
df = df.filter($"b" >= 100).withColumn("b", exampleUDF(lit(iteration), $"b"))
}
您可能需要使用foldLeft
方法:
val finalDF = (1 to 3).foldLeft(df){(acc, iter) =>
acc.filter($"b" >= 100)
.withColumn("b", exampleUDF(lit(iter), $"b"))
}