Spark Scala函数调用



要求如下:

val value = Array["id","sd","cd"]  -- List of columns 
val cols_list = Array["cd","id","tm","no","in","ts","nm"] -  -- List of columns 

abcd是架构名称。需要有值的列和cols_list中没有值的列。

val alter = df.select(value + ("abcd." + x.toUpperCase() for x <- cols_list if x.toUpperCase() not in value)).where(df.status =="ALERT")

错误是它无法读取x。要求是具有选择条件的数据帧和具有非in和action的循环。有什么想法/建议吗。

我试过如下val diff_cols=value diff cols_list--看起来这不是个好主意。

val alter = df.select(value + ("abcd." + diff_cols).where(df.status 
=="ALERT")

但我现在看到的问题不是列[Ljava.lang.String;@6cc9bbea正在通过,但它正在失败。

请建议是否有其他解决方案?

请检查以下代码。

在spark中,您可以访问不带模式名称的列。

scala> val value = Array("id","sd","cd")
value: Array[String] = Array(id, sd, cd)
scala> val cols_list = Array("cd","id","tm","no","in","ts","nm")
cols_list: Array[String] = Array(cd, id, tm, no, in, ts, nm)
scala> val columns = value ++ cols_list.diff(value)
columns: Array[String] = Array(id, sd, cd, tm, no, in, ts, nm)
scala> val schema = "abcd"
schema: String = abcd
scala> columns.map(column => s"${schema}.${column}") // This step is not required, in spark you can access columns without schema name. if you still want you can use like this.
res14: Array[String] = Array(abcd.id, abcd.sd, abcd.cd, abcd.tm, abcd.no, abcd.in, abcd.ts, abcd.nm)
scala> df.select(columns.head,columns.tail:_*).where($"status" === "ALERT")

最新更新