如何在数组字段上加入



我有 2 个数据集,即距离和客户,想要找出客户数据集中的 id 是否存在于id_5距离数据集中,其中id_5是 id 数组。非常感谢您的帮助。

case class Distance(zip: String, id_5: Array[Int])
val dist = Seq(Distance("72712",Array(72713,72714,72715)))
val distDS=dist.toDS()
case class Customer (cust_id: Int, id: String)
val c = Seq(Customer(1,"72713"),Customer(2,"72714"),Customer(3,"72720"))
val custDS = c.toDS()
val res = distDS.joinWith(custDS,distDS.col("id_5"(??????)) === custDS.col("id"))`

使用 array_contains

import org.apache.spark.sql.functions.expr
distDS.joinWith(custDS, expr("array_contains(id_5, cust_id)"))

相关内容

  • 没有找到相关文章

最新更新