我有一个pyspark数据帧,如下所示。我只需要2017-12-17和2017-12-19之间的target_date行,这两个日期都包括在内。输入:
+-------+-----------+------------+
| id|target_date|order_before|
+-------+-----------+------------+
|1471783| 2017-12-16| 2|
|1471885| 2017-12-16| 2|
|1472928| 2017-12-17| 2|
|1476917| 2017-12-17| 2|
|1477469| 2017-12-18| 1|
|1478190| 2017-12-19| 4|
+-------+-----------+------------+
我需要的输出如下。
+-------+-----------+------------+
| id|target_date|order_before|
+-------+-----------+------------+
|1472928| 2017-12-17| 2|
|1476917| 2017-12-17| 2|
|1477469| 2017-12-18| 1|
|1478190| 2017-12-19| 4|
+-------+-----------+------------+
只需使用between
即可。
df.filter("target_date between '2017-12-17' and '2017-12-19'").show(truncate=False)
+-------+-----------+------------+
|id |target_date|order_before|
+-------+-----------+------------+
|1472928|2017-12-17 |2 |
|1476917|2017-12-17 |2 |
|1477469|2017-12-18 |1 |
|1478190|2017-12-19 |4 |
+-------+-----------+------------+