为什么齐柏林飞艇在 %spark.sql 段落中因"mismatched input ';' expecting <EOF>"而失败？

我从csv构建了一个镶木地板文件。

在齐柏林飞艇中，我创建了一个 sql 语句，如下所示：

%spark.sql
DROP TABLE IF EXISTS df;
CREATE TABLE df (
date_time STRING
, site_name STRING
, posa_continent STRING
, user_location_country STRING
, user_location_region STRING
, user_location_city STRING
, orig_destination_distance DOUBLE
, user_id STRING
, is_mobile STRING
, is_package STRING
, channel STRING
, srch_ci STRING
, srch_co STRING
, srch_adults_cnt INT 
, srch_children_cnt INT
, srch_rm_cnt INT
, srch_destination_id STRING
, srch_destination_type_id STRING
, is_booking STRING
, cnt INT
, hotel_continentm STRING
, hotel_country STRING
, hotel_market STRING
, hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")

结果我收到一个错误：

mismatched input ';' expecting <EOF>(line 1, pos 23)
== SQL ==
DROP TABLE IF EXISTS df;
-----------------------^^^
CREATE TABLE df (
date_time STRING
, site_name STRING
, posa_continent STRING
, user_location_country STRING
, user_location_region STRING
, user_location_city STRING
, orig_destination_distance DOUBLE
, user_id STRING
, is_mobile STRING
, is_package STRING
, channel STRING
, srch_ci STRING
, srch_co STRING
, srch_adults_cnt INT 
, srch_children_cnt INT
, srch_rm_cnt INT
, srch_destination_id STRING
, srch_destination_type_id STRING
, is_booking STRING
, cnt INT
, hotel_continent STRING
, hotel_country STRING
, hotel_market STRING
, hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")
set zeppelin.spark.sql.stacktrace = true to see full stacktrace

我不明白这个问题。csv 用"，"分隔。

谁能帮我？

在齐柏林飞艇的%spark.sql段落(又名代码部分)中使用一个 SQL 语句。

所以，一段中的这一行：

DROP TABLE IF EXISTS df;

以及另一段%spark.sql中的那个。

CREATE TABLE df (
date_time STRING
, site_name STRING
, posa_continent STRING
, user_location_country STRING
, user_location_region STRING
, user_location_city STRING
, orig_destination_distance DOUBLE
, user_id STRING
, is_mobile STRING
, is_package STRING
, channel STRING
, srch_ci STRING
, srch_co STRING
, srch_adults_cnt INT 
, srch_children_cnt INT
, srch_rm_cnt INT
, srch_destination_id STRING
, srch_destination_type_id STRING
, is_booking STRING
, cnt INT
, hotel_continentm STRING
, hotel_country STRING
, hotel_market STRING
, hotel_cluster STRING)
USING parquet
OPTIONS (path "s3://hansprojekt/training_17000000pq")

%spark.sql提供了一个使用Spark SQL(通过SparkSQLInterpreter)的SQL环境。

如果我没记错的话，当请求结果时SparkSQLInterpreter简单地执行SQLContext.sql：

// method signature of sqlc.sql() is changed
// from  def sql(sqlText: String): SchemaRDD (1.2 and prior)
// to    def sql(sqlText: String): DataFrame (1.3 and later).
// Therefore need to use reflection to keep binary compatibility for all spark versions.
Method sqlMethod = sqlc.getClass().getMethod("sql", String.class);
rdd = sqlMethod.invoke(sqlc, st);

这指向SQLContext.sql作为"执行环境"。

sql

(sqlText： String)：DataFrame 使用 Spark 执行 SQL 查询，将结果作为 DataFrame 返回。

sql期望一个SQL语句。

相关内容

最新更新

热门标签：