使用JSONSERDE时启动令牌找不到错误



我正在尝试从S3导入JSON数据,并在进行一些查询后,将输出作为JSON格式再次导出到S3。但是,我得到" org.apache.hadoop.hive.hive.serde2.serdeexception:java.io.io.ioexception:在hive hive spect of Emr cluster上的错误。为了了解问题是什么,我简化了Hive脚本和JSON数据,但是它不断给出相同的错误。我该如何解决这个问题?

集群配置:

版本:EMR-5.3.1

Hive版本:2.1.1

hadoop发行:亚马逊2.7.3

服务角色:emr_defaultrole

MasterInstanceType:M4.large

简化的JSON数据的内容:

[{"MyID":"FOO123","MyField":"FOO"},{"MyID":"BAR123","MyField":"BAR"}]

蜂巢脚本:

DROP TABLE IF EXISTS SOURCE;
DROP TABLE IF EXISTS DESTINATION;
CREATE EXTERNAL TABLE SOURCE(MyID STRING, MyField STRING)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://myPath/subPath/';
CREATE EXTERNAL TABLE DESTINATION(MyID STRING, MyField STRING)                                    
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
LOCATION 's3://anotherPath/subPath/';
INSERT OVERWRITE TABLE DESTINATION SELECT MyID, MyField FROM SOURCE;

这是堆栈跟踪:

。myid":" bar123"," myfield":" bar"}]] atrg.apache.hadoop.hive.ql.exec.tez.tez.tezprocessor.initializeandrunprocessor(tezprocessor.java:211) atrg.apache.hadoop.hive.ql.exec.tez.tez.tezprocessor.run(tezprocessor.java:168) atorg.apache.tez.runtime.logicaliopercessorruntimetask.run(logicalioporcessorruntimetask.java:370) atorg.apache.tez.runtime.task.taskrunner2callable $ 1.run(taskrunner2callable.java:73) atorg.apache.tez.runtime.task.taskrunner2callable $ 1.run(taskrunner2callable.java:61) 在java.security.accesscontroller.doprivileged(本机方法) 在javax.security.auth.subject.doas(object.java:422) atrg.apache.hadoop.security.usergroupinformation.doas(userGroupInformation.java:1698) 请访问org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable.java:61) 请访问org.apache.tez.runtime.task.taskrunner2callable.callinternal(taskrunner2callable.java:37) 在org.apache.tez.common.callablewithndc.call(callablewithndc.java:36) 在java.util.concurrent.futuretask.run(futuretask.java:266) at Java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) at Java.util.concurrent.threadpoolexecutor $ worker.run(threadpoolexecutor.java:617) 在java.lang.thread.run(thread.java:745) 引起的:myid":" bar123"," myfield":" bar"}]] at org.apache.hadoop.hive.ql.exec.tez.maprecordsource.processrow(maprecordsource.java:95) at org.apache.hadoop.hive.ql.exec.tez.maprecordsource.pushrecord(mapRecordSource.java:70) at org.apache.hadoop.hive.ql.exec.tez.maprecordprocessor.run(mapRecordProcessor.java:383) at org.apache.hadoop.hive.ql.exec.tez.tez.tezprocessor.initializeandrunprocessor(tezprocessor.java:185) ... 14 引起的是:org.apache.hadoop.hive.ql.metadata.hiveException:hive运行时错误,同时处理可写的[{" myid":" foo123"," myfield":" foo":" foo"},{" myid" myid':" myid":" bar123":" bar123"。," myfield":" bar"}]] atrg.apache.hadoop.hive.ql.exec.mapoperator.process(mapoperator.java:497) at org.apache.hadoop.hive.ql.exec.tez.maprecordsource.processrow(maprecordsource.java:86) ... 17更多 引起的是:org.apache.hadoop.hive.serde2.serdeexception:java.io.io.ioexception:启动令牌找不到预期的位置 请访问org.apache.hive.hcatalog.data.jsonserde.deserialize(jsonserde.java:183) atrg.apache.hadoop.hive.ql.exec.mapoperator $ mapopctx.readrow(mapoperator.java:128) atorg.apache.hadoop.hive.ql.exec.mapoperator $ mapopctx.Access $ 200(mapoperator.java:92) atrg.apache.hadoop.hive.ql.exec.mapoperator.process(mapoperator.java:488) ...还有18 引起的: 请访问org.apache.hive.hcatalog.data.jsonserde.deserialize(jsonserde.java:169) ... 21多

谢谢。

json应该从{开始,而不是从数组([

开始

我尝试使用这种方法更新了我的JSON文件,该文件

{"MyID":"FOO123","MyField":"FOO"},
{"MyID":"BAR123","MyField":"BAR"}

但是完成后,我注意到只插入了第一个对象。

相关内容

最新更新