我正在尝试使用Spark SQL上下文计算Hive表的统计信息。
火花版本 : 1.6.3
sqlContext.sql("ANALYZE TABLE sample PARTITION (company='aaa', market='aab',pdate='2019-01-10') COMPUTE STATISTICS FOR COLUMNS")
我收到以下错误,但我能够在 Hive 中执行相同的查询。
错误:
org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Could not initialize class com.sun.jersey.core.header.MediaTypes
当我尝试按如下所示运行时,我遇到错误,因为"找不到分区">
sqlContext.sql("ANALYZE TABLE sample PARTITION (company='aaa', market='aab',pdate='2019-01-10') COMPUTE STATISTICS")
错误:
org.apache.spark.sql.execution.QueryExecutionException: FAILED: SemanticException [Error 10006]: Line 1:56 Partition not found ''2019-01-10''
请让我知道如何纠正相同的问题。
谢谢。!
你能试试下面的查询吗
sqlContext.sql("ANALYZE TABLE sample COMPUTE STATISTICS FOR COLUMNS col1 [, col2, ...]")
sqlContext.sql("ANALYZE TABLE sample COMPUTE STATISTICS [NOSCAN]")
请查看 https://docs.databricks.com/spark/latest/spark-sql/language-manual/analyze-table.html 以获取更多信息。