我有一个表的值ID和值
| id | value |
-----------------
| 1 | UnKnown |
| 1 | A |
| 2 | UnKnown |
| 2 | UnKnown |
| 3 | B |
| 3 | B |
| 3 | B |
我需要从表中选择不同的id和相应的值。当选择Id时应该是唯一的,如果它在值字段中有多个值,它应该只检索不未知值
所以结果应该如下:
| id | value |
-----------------
| 1 | A |
| 2 | UnKnown |
| 3 | B |
我如何实现组与条件类似的值是'UnKnown'然后保持'UnKnown'在SQL或Spark Scala的值?
下面是使用Scala和Spark 2.0.0 SQL的示例。你可以在spark-shell上试试。
val v = Seq((1,"Unknown"),(1,"A"),(2,"Unknown"),(2,"Unknown"),(3,"B"),(3,"B"),(3,"B")).toDF("id","value")
v.show
v.createOrReplaceTempView("v1")
spark.sql("select * from v1 where value!='Unknown' union (select * from v1 a where (select count (*) from v1 b where a.id=b.id and b.value!='Unknown')<1)").show