Spark SQL - 确定架构时的运行时异常



我正在尝试从笔记本电脑查询远程(本地(hive数据库中的表。我正在使用火花sql。我能够连接到它并检索最新的分区。

但是,当我尝试检索一列(假设 pid(时,它会抛出以下错误:

19/10/08 15:01:19 ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem: Unable to read schema from given path: maprfs:/user/<database_name>//<table_name>/schema/partition_epoch=<partition_id>/xyz.avsc)
Caused by: java.net.MalformedURLException: unknown protocol: maprfs
at java.net.URL.<init>(URL.java:600)
at java.net.URL.<init>(URL.java:490)
at java.net.URL.<init>(URL.java:439)
at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException

我尝试使用描述表命令并尝试打印架构

Dataset<Row> descTable = spark.sql("desc db.tablename");
descTable.printSchema();

打印的架构似乎已关闭,并且未列出任何字段。相反,它会打印出描述字段的标题

root
|-- col_name: string (nullable = false)
|-- data_type: string (nullable = false)
|-- comment: string (nullable = true)

我期待这样的事情

pid  string   from deserializer

当我使用显式字段查询时,代码最终失败

19/10/08 15:01:25 WARN HiveExternalCatalog: The table schema given by Hive metastore(struct<partition_epoch:string>) is different from the schema when this table was created by Spark SQL(struct<all fields and their type>,partition_epoch:string>). We have to fall back to the table schema from Hive metastore which is not case preserving.
19/10/08 15:01:25 ERROR: exception: cannot resolve '`pid`' given input columns: [db.tablename.partition_epoch]; line 1 pos 7;
cannot resolve '`pid`' given input columns: [db.tablename.partition_epoch]; line 1 pos 7;
'Project ['pid]
+- Filter (cast(partition_epoch#9 as int) = 1570500000)
+- SubqueryAlias `db`.`tablename`
+- HiveTableRelation `db`.`tablename`, org.apache.hadoop.hive.serde2.avro.AvroSerDe, [partition_epoch#9]

以下是我用来创建 SparkSession 和查询表

的代码
SparkSession spark = SparkSession
.builder()
.master("local[*]")
.appName("myApp")
.config("spark.hadoop.javax.jdo.option.ConnectionURL","jdbc:mysql://url:3306/metastore?createDatabaseIfNotExist=false")
.config("spark.hadoop.javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver")
.config("spark.hadoop.javax.jdo.option.ConnectionUserName","username")
.config("spark.hadoop.javax.jdo.option.ConnectionPassword","password")
.config("hive.metastore.warehouse.dir", "hive")
.enableHiveSupport()
.getOrCreate();
Dataset<Row> productIds = spark.sql("select pid FROM db.tablename WHERE partition_epoch="+partitionEpoch);
System.out.println(productIds.collect());

我在 hive-site 中查找 hivemeta.uris.xml在 etc/hive/conf 下,但它没有这些信息。

如何解决架构错误并查询表。?

descTable.printSchema(( 描述的是数据帧的模式,而不是表的模式。 使用 descTable.show(( 查看您的架构。

验证问题检查表属性属性 Spark.sql.sources.schema。表的架构必须与此属性描述的架构相同。

相关内容

  • 没有找到相关文章

最新更新