在parksql中筛选出空字符串和空字符串

好的，我有一些数据要过滤掉所有的null和空值。因此，我使用简单的sql命令首先过滤掉空值。

hiveContext.sql("select username from daten where username is not null").show()

我得到的是这样的东西。

org.apache.spark.sql.DataFrame = [username: array<string>]

用户名

|          [null]|
|          [null]|
|          [null]|
|              []|
|              []|
|          [null]|
|          [null]|
|              []|
|          [null]|
|          [null]|
|          [null]|
|          [null]|
|[dirk.staszak.3]|
|              []|
|              []|
|          [null]|
|          [null]|
|          [null]|
|          [null]|
|          [null]|

所以里面还有一些空条目。我不知道为什么？

有没有一种方法可以通过使用sparksql过滤掉这些空条目，并额外过滤掉所有空字符串？我考虑过按字符串长度过滤，但sparksql不支持len函数。

您可以从给定的表创建一个数据帧，并在spark-sql中使用filter来完成任务dataframe.filter("username is not null and username !='null'")

相关内容

最新更新

热门标签：