我们有一个运行HDP 2.2.0.0的Hadoop集群。
我们有另一个运行HDP 2.2.4.2的Hadoop集群。
我们有一个带有Hive动作的Oozie工作流,在HDP 2.2.0.0的第一个集群上运行良好。
但是同样的工作流在第二个运行HDP 2.2.4.2的集群中失败了,出现这个错误:
38098 [main] INFO org.apache.hadoop.hive.ql.Driver - Starting task [Stage-4:MOVE] in serial mode
2015-07-15 16:23:22,810 INFO [main] ql.Driver (Driver.java:launchTask(1604)) - Starting task [Stage-4:MOVE] in serial mode
38099 [main] INFO org.apache.hadoop.hive.ql.exec.Task - Moving data to: hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10000 from hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10002
2015-07-15 16:23:22,811 INFO [main] exec.Task (SessionState.java:printInfo(824)) - Moving data to: hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10000 from hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10002
40129 [main] ERROR hive.ql.metadata.Hive - Unable to move using hadoop distcp, source hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10002 to destination hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10000 using command: /usr/bin/hadoop distcp hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10002 hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10000
2015-07-15 16:23:24,841 ERROR [main] metadata.Hive (Hive.java:renameFile(2444)) - Unable to move using hadoop distcp, source hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10002 to destination hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10000 using command: /usr/bin/hadoop distcp hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10002 hdfs://master-1.local:8020/tmp/hive/cloudfeeds/00f8edac-8b5a-4dfa-9115-5a915acabee0/hive_2015-07-15_16-22-49_023_841777402951025944-1/-ext-10000
40129 [main] ERROR hive.ql.metadata.Hive - Exit value for hadoop distcp command 255
在日志的下面,我们有这个错误:
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=yarn, access=EXECUTE, inode="/tmp/hive/cloudfeeds":cloudfeeds:hdfs:drwx------
我检查了上面目录的权限:/tmp/hive/cloudfeeds。两个集群具有相同的权限700和所有者cloudfeeds。
我检查了map reduce作业日志,两个集群都有这些:
user.name=yarn
mapreduce.job.user.name=cloudfeeds
我不想只关闭dfs.permissions。我也不想给目录/tmp/hive/cloudfeeds赋予777权限,我确信这会导致作业成功运行。
有什么想法我应该如何调试这个,更重要的是如何解决这个问题?
我通过将此添加到hive-site.xml来解决权限问题:
<property>
<name>hive.scratch.dir.permission</name>
<value>777</value>
<description>The permission for the user specific scratch directories that get created.</description>
</property>