当Spark数据帧上有指令时,使用Column+的倍数的全局条件



考虑这个df

+----+------+
|cond|chaine|
+----+------+
|   0|   TF1|
|   1|   TF1|
|   1|   TNT|
+----+------+

我想将此与Column指令一起应用,但仅适用于具有cond == 1:的行

df.withColumn("New", when($"chaine" === "TF1", "YES!"))
.withColumn("New2", when($"chaine" === "TF1", "YES2!"))
.withColumn("New3", when($"chaine" === "TF1", "YES3!"))
.withColumn("New4", when($"chaine" === "TF1", "YES4!"))

我不能使用.filter,因为我仍然希望在输出中具有cond =!= 1的行。

我可以通过在代码中的每个位置添加我的条件来完成:

df.withColumn("New", when($"chaine" === "TF1" AND $"cond" === 1, "YES!"))
.withColumn("New2", when($"chaine" === "TF1" AND $"cond" === 1, "YES2!"))
.withColumn("New3", when($"chaine" === "TF1" AND $"cond" === 1, "YES3!"))
.withColumn("New4", when($"chaine" === "TF1" AND $"cond" === 1, "YES4!"))

但问题是,我有很多新的专栏,我想要一个更好的解决方案(比如全局确认?(

谢谢。

一些简单的句法思想:

def whenCondIs(n: Int)(condition: Column, value: Any): Column =
when(condition && $"cond" === n, value)
def whenOne(condition: Column, value: Any): Column  = 
whenCondIs(1)(condition, value)

然后:

df.withColumn("New", whenOne($"chaine" === "TF1", "YES2!"))
.withColumn("New2", whenOne($"chaine" === "TF1", "YES2!"))

您可以在列表中创建条件和要创建的新列之间的映射,并使用foldLeft将它们添加到数据帧中。类似这样的东西:

val newCols = Seq(
("New", "chaine='TF1'", "YES!"),
("New2", "chaine='TF1'", "YES2!"),
("New3", "chaine='TF1'", "YES3!"),
("New4", "chaine='TF1'", "YES4!")
)
val df1 = newCols.foldLeft(df)((acc, x) =>
acc.withColumn(x._1, when(expr(x._2) && col("cond")===1, lit(x._3)))
)

最新更新