如何过滤pyspark数据框中的日期



我有一个pyspark数据框架:

Year    Month
2021    06/01/2021
2021    06/01/2021
2021    07/01/2021
2021    07/01/2021
2021    0/01/2021
2021    0/01/2021

我需要一个特定月份的数据框架以及' 01/01/2021 '。试过下面的代码:

df=df.filter((col('Month')=='07/01/2021') & (col('Month')=='0/01/2021'))
display(df)

我需要的数据框架是:

Year    Month
2021    07/01/2021
2021    07/01/2021
2021    0/01/2021
2021    0/01/2021

但是我得到:Query returned no results作为结果。"Month"列是字符串格式。如何过滤这些日期?

这很正常。您要求每一行的值等于07/01/2021和(&) 01/2021。
month = 07/01/2021或(|) 0/01/2021:

from pyspark.sql.functions import col
a = [
(2021, "06/01/2021"),
(2021, "06/01/2021"),
(2021, "07/01/2021"),
(2021, "07/01/2021"),
(2021, "0/01/2021"),
(2021, "0/01/2021"),
]
b = "Year", "Month"
df = spark.createDataFrame(a, b)
df = df.filter((col("Month") == "07/01/2021") | (col("Month") == "0/01/2021"))
# 
df.show()
+----+----------+                                                               
|Year|     Month|
+----+----------+
|2021|07/01/2021|
|2021|07/01/2021|
|2021| 0/01/2021|
|2021| 0/01/2021|
+----+----------+

你也可以这样写:

df.filter(col("Month").isin("07/01/2021", "0/01/2021")).show()

最新更新