要删除重复的行,我尝试此sql
val characters = MongoSpark.load[sparkSQL.Character](sparkSession)
characters.createOrReplaceTempView("characters")
val testsql = sparkSession.select("SELECT * FROM characters GROUP BY title")
testsql.show()
但是此SQL会发出此错误消息。 如果您知道这个问题,请回答这个问题。
谢谢
Parsing command: SELECT * FROM characters GROUP BY title
Exception in thread "main" org.spache.spark.sql.AnalysisException:
expression 'characters.`url`' is neither present in the group by, nor is it an aggregate function
Add to Add to group by or wrap in first() if you don't care which value you get.;;
然后我尝试这样,但我不知道这是正确的解决方案......
请回答这个问题。谢谢!
val characters = MongoSpark.load[sparkSQL.Character](sparkSession)
characters.createOrReplaceTempView("characters")
val testsql = sparkSession.select("SELECT * FROM characters")
testgrsql = testsql.groupBy("title")
testgrsql.show()
错误消息解释了一切,
解析命令:选择 * 从字符分组按标题分组
线程"main"中的异常 org.spache.spark.sql.AnalysisException: 表达式'characters.url' 既不存在于分组依据中,也不是聚合函数
添加到添加到分组依据或包装 first(( 如果你不在乎你得到哪个值。
所以用法可以是,如果你想要每个标题的第一个url值,那么first(url)
characters.createOrReplaceTempView("characters")
val testsql = sparkSession.sql("SELECT title, first(url) FROM characters GROUP BY title")