数据帧筛选器问题,怎么办



Env: Spark 1.6, Scala

我的数据帧就像波纹管

DF=
DT col1 col2
----------|---|----
2017011011|AA|
BB2017011011|CC|
DD2017011015|PP|
BB2017011015|QQ|
DD2017011016|AA|
BB2017011016|CC|
DD2017011017|PP|
BB2017011017|QQ|DD

如何过滤以获得像 SQL 这样的结果 - select * from DF where dt> (select distinct dt from DF order by dt desc limit 3)

输出有最近 3 个日期

2017011015 |聚丙烯 |
BB2017011015 |QQ |
DD2017011016 |机管局 |
BB2017011016 |抄送 |
DD2017011017 |聚丙烯 |
BB2017011017 |QQ |DD

谢谢
侯赛因

在 Spark 1.6.1 上测试

import sqlContext.implicit._
val df = sqlContext.createDataFrame(Seq(
  (2017011011, "AA", "BB"),
  (2017011011, "CC", "DD"),
  (2017011015, "PP", "BB"),
  (2017011015, "QQ", "DD"),
  (2017011016, "AA", "BB"),
  (2017011016, "CC", "DD"),
  (2017011017, "PP", "BB"),
  (2017011017, "QQ", "DD")
)).select(
  $"_1".as("DT"),
  $"_2".as("col1"),
  $"_3".as("col2")
) 
val dates = df.select($"DT")
  .distinct()
  .orderBy(-$"DT")
  .map(_.getInt(0))
  .take(3)
val result = df.filter(dates.map($"DT" === _).reduce(_ || _))
result.show()

相关内容

  • 没有找到相关文章

最新更新