Env: Spark 1.6, Scala
我的数据帧就像波纹管
DF=
DT col1 col2
----------|---|----
2017011011|AA|
BB2017011011|CC|
DD2017011015|PP|
BB2017011015|QQ|
DD2017011016|AA|
BB2017011016|CC|
DD2017011017|PP|
BB2017011017|QQ|DD
如何过滤以获得像 SQL 这样的结果 - select * from DF where dt> (select distinct dt from DF order by dt desc limit 3)
输出有最近 3 个日期
2017011015 |聚丙烯 |
BB2017011015 |QQ |
DD2017011016 |机管局 |
BB2017011016 |抄送 |
DD2017011017 |聚丙烯 |
BB2017011017 |QQ |DD
谢谢
侯赛因
在 Spark 1.6.1 上测试
import sqlContext.implicit._
val df = sqlContext.createDataFrame(Seq(
(2017011011, "AA", "BB"),
(2017011011, "CC", "DD"),
(2017011015, "PP", "BB"),
(2017011015, "QQ", "DD"),
(2017011016, "AA", "BB"),
(2017011016, "CC", "DD"),
(2017011017, "PP", "BB"),
(2017011017, "QQ", "DD")
)).select(
$"_1".as("DT"),
$"_2".as("col1"),
$"_3".as("col2")
)
val dates = df.select($"DT")
.distinct()
.orderBy(-$"DT")
.map(_.getInt(0))
.take(3)
val result = df.filter(dates.map($"DT" === _).reduce(_ || _))
result.show()