如何在spark中检查数据帧的列为null或空。
Ex。
type IdentifiedDataFrame = {SourceIdentfier, DataFrame}
def splitRequestIntoDFsWithAndWithoutTransactionId(df: DataFrame) : Seq[IdentifiedDataFrame] = {
seq((DeltaTableStream(RequestWithTransactionId), df.filter(col(RequestLocationCodeColName).isNull
&& col(ServiceNumberColName).isNull
&& col(DateOfServiceColName).isNull
&& col(TransactionIdColName).isNotNull)).
(DeltaTableStream(RequestWithoutTransactionId), df.filter(col(RequestLocationCodeColName).isNotNull
&& col(ServiceNumberColName).isNotNull
&& col(DateOfServiceColName).isNotNull))
)
}
注意:这段代码只检查列中的null值,我想同时检查null或空字符串请帮助
您可以使用isNull
函数并使用filter
检查空字符串,如下所示
val columns = List("column1", "column2")
val filter = columns.map(c => isnull(col(c)) || !(col(c) <=> lit("")))
.reduce(_ and _)
df.filter(filter)