当我在parquet文件上运行sql时,我总是像这样调用sqlContext.read.parquet()
=> df.registerTempTable()
=> sqlContext.sql()
:
val df = sqlContext.read.parquet("path/to/2016.05.30/")
df.registerTempTable("tab")
sqlContext.sql("SELECT * FROM tab")
Spark手册说:
Instead of using read API to load a file into DataFrame and query it, you can also query that file directly with SQL.
val df = sqlContext.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`")
我这样修改:
val df = sqlContext.sql("SELECT * FROM parquet.`path/to/2016.05.30/`")
但是我得到一个错误
org.apache.spark.sql.AnalysisException: no such table parquet.path/to/2016.05.30/;
如何直接查询?
直接查询文件将从Spark 1.6开始支持。请检查您正在运行的spark版本。