如何处理spark scala模式匹配中的null



下面是spark shell代码

scala> val colName = "time_period_id"

scala> val df = spark.sql("""select time_period_id from prod.demo where time_period_id = 
202101102 """)
df: org.apache.spark.sql.DataFrame = [time_period_id: int]
scala>  val result = df.agg(max(colName)).head(1)
result: Array[org.apache.spark.sql.Row] = Array([null])

scala>     result(0).getInt(0) match {
|       case null => 0
|       case _ => result(0).getInt(0)
|     }

如果结果是Array([null]),那么我希望返回0,如果结果是Array([20210110]),那么我希望返回20210110

但是我得到这个错误

<console>:33: error: type mismatch;
found   : Null(null)
required: Int
case null => 0

您可以使用lift获得数组项,以便将其作为选项处理。如果元素内部为空,则返回Some(null);如果根本没有元素,则返回None

result.lift(0) match {
case Some(element) if element != null => element.getInt(0)
case _ => 0
}

或者如果你不介意在你的模式匹配中有更多的行:

result.lift(0) match {
case Some(null) => 0
case Some(element) => element.getInt(0)
case _ => 0
}

另一个选项是使用Try

import scala.util.Try
import scala.util.Success
Try(result(0).getInt(0)) match {
case Success(date) => date
case _ => 0
} 

您也可以在result(0):

上使用isNullAt()方法
scala> val result = Array(org.apache.spark.sql.Row(null))
result: Array[org.apache.spark.sql.Row] = Array([null])
scala> result(0) match {
| case x if x.isNullAt(0) => 0
| case x => x.getInt(0)
| }
res0: Int = 0

最新更新