已插入配置单元记录,但随后出现错误



我在 hive 中创建了一个表:

CREATE TABLE `test3`.`shop_dim`  ( 
`shop_id`               bigint, 
`shop_name`             string, 
`shop_company_id`       bigint, 
`shop_url1`             string, 
`shop_url2`             string, 
`sid`                   string, 
`shop_open_duration`    string, 
`date_modified`         timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ("path"="hdfs://myhdfs/warehouse/tablespace/managed/hive/test3.db/shop_dim")
STORED AS PARQUET
TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='{"BASIC_STATS":"true","COLUMN_STATS":{"date_modified":"true","shop_company_id":"true","shop_id":"true","shop_name":"true","shop_open_duration":"true","shop_url1":"true","shop_url2":"true","sid":"true"}}', 'bucketing_version'='2', 'numFiles'='12', 'numRows'='12', 'rawDataSize'='96', 'spark.sql.create.version'='2.3.0', 'spark.sql.sources.provider'='parquet', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"Shop_id","type":"long","nullable":true,"metadata":{}},{"name":"Shop_name","type":"string","nullable":true,"metadata":{}},{"name":"Shop_company_id","type":"long","nullable":true,"metadata":{}},{"name":"Shop_url1","type":"string","nullable":true,"metadata":{}},{"name":"Shop_url2","type":"string","nullable":true,"metadata":{}},{"name":"sid","type":"string","nullable":true,"metadata":{}},{"name":"Shop_open_duration","type":"string","nullable":true,"metadata":{}},{"name":"Date_modified","type":"timestamp","nullable":true,"metadata":{}}]}', 'totalSize'='17168')
GO

然后我在sql下面插入一个记录:

insert into test3.shop_dim values(11,'aaa',22,'11113','2222','sid','opend',unix_timestamp())

我可以看到记录已插入,但是等待了很长时间,出现错误:

>[Error] Script lines: 1-2 --------------------------
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.StatsTask 
[Executed: 2018-10-24 下午12:00:03] [Execution: 0ms] 

我使用水上工作室作为工具。为什么会发生此错误?

如果要插入的值与预期类型不匹配,则可能会发生此问题。 在您的情况下,"date_modified"列是时间戳类型,但unix_timestamp((将返回bigint(以秒为单位的当前Unix时间戳(。

如果执行查询

select unix_timestamp();

输出如下所示:1558547043

相反,您需要使用current_timestamp。

select current_timestamp;

输出如下:2019-05-22 17:50:18.803

有关内置日期函数,请参阅 Hive 手册,请访问 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

下面给定的hive设置可以帮助解决org.apache.hadoop.hive.ql.exec.StatsTask (state=08S01,code=1(

设置 hive.stats.column.autogather=false; 或设置 hive.stats.autogather=false 设置 hive.optimize.sort.dynamic.partition=true;

最新更新