我正在使用pyspark和SQL上下文,该上下文允许您在框架中编写SQL查询。由于某种原因,此命令不起作用,不确定为什么。
complaint_by_city = sqlContext.sql('SELECT City, COUNT(*) as `city_comp` '
'FROM c311 '
'GROUP BY City '
'COLLATE NOCASE '
'ORDER BY -city_comp '
'LIMIT 21 ')
编辑它给我的错误是这个
ParseException: u"nmismatched input 'COLLATE' expecting {<EOF>, ',', '.', '[', 'LIMIT', 'OR', 'AND', 'IN', NOT, 'BETWEEN', 'LIKE', RLIKE, 'IS', 'ASC', 'DESC', 'WINDOW', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', 'DIV', '&', '|', '^', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 81)nn== SQL ==nSELECT City, COUNT(*) as `city_comp` FROM c311 GROUP BY City ORDER BY -city_comp COLLATE NOCASELIMIT 21 n---------------------------------------------------------------------------------^^^n"
我可以建议:
SELECT LOWER(City) as City, COUNT(*) as city_comp
FROM c311
GROUP BY LOWER(City)
ORDER BY city_comp DESC
LIMIT 21;