当我尝试运行Hive update语句时,得到以下错误:
2021-02-25 15:38:54,934 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1592334694783_33388_r_000007_3: Error: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":{"transactionid":0,"bucketid":-1,"rowid":3}},"value":{"_col0":"T","_col1":1111111,"......."_col44":""}}
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:790)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235)
更新查询很简单。
Target表中的所有列都是string或Decimal。
发现另一个问题点Cloudera Link,但问题是这个查询大多数时间运行,但当运行某些类型的数据时失败。
Update语句
UPDATE Table1 a
SET
email = MaskData(email,1)
WHERE d_Date >= '2017-01-01' and
email IN (select distinct email from Table2);
任何前进的道路或帮助都会有所帮助。提前感谢。
当我们从Spark插入数据时,看起来数据没有正确地存储。必须重新做完整的表,它工作正常。