我创建了一个Oozie Sqoop任务来将数据从MySQL导入Hive。 我有一个Namenode和3个DataNode,它们在Namenode上还安装了Hive,Oozie和Sqoop。
sqoop导入coommand已经在namenode上测试了var CLI,但是每次我创建一个oozie sqoop任务时,它也会失败。 以下是详细错误。
2017-08-11 11:27:40,787 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - map 0% reduce 0%
2017-08-11 11:27:40,787 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - map 0% reduce 0%
2017-08-11 11:27:44,833 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - map 25% reduce 0%
2017-08-11 11:27:44,833 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - map 25% reduce 0%
2017-08-11 11:27:45,837 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - map 75% reduce 0%
2017-08-11 11:27:45,837 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - map 75% reduce 0%
2017-08-11 11:27:46,844 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - map 100% reduce 0%
2017-08-11 11:27:46,844 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - map 100% reduce 0%
2017-08-11 11:27:46,856 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - Job job_1502360348741_0041 completed successfully
2017-08-11 11:27:46,856 [uber-SubtaskRunner] INFO org.apache.hadoop.mapreduce.Job - Job job_1502360348741_0041 completed successfully
...
2017-08-11 11:27:46,932 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 625 bytes in 12.0595 seconds (51.8263 bytes/sec)
2017-08-11 11:27:46,936 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 14 records.
2017-08-11 11:27:46,951 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM UserRole AS t WHERE 1=0
2017-08-11 11:27:46,952 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM UserRole AS t WHERE 1=0
2017-08-11 11:27:46,953 [uber-SubtaskRunner] WARN org.apache.sqoop.hive.TableDefWriter - Column updatedDate had to be cast to a less precise type in Hive
2017-08-11 11:27:46,960 [uber-SubtaskRunner] INFO org.apache.sqoop.hive.HiveImport - Loading uploaded data into Hive
2017-08-11 11:27:46,963 [uber-SubtaskRunner] ERROR org.apache.sqoop.tool.ImportTool - Encountered IOException running import job: java.io.IOException: Cannot run program "hive": error=2,
这是我的想法
- 映射器作业已生成,因此应提交此脚本并在 NameNode 上运行。这是对的吗?
- 我都是 env vars 配置,所以错误阶段,哪个导入工具将数据导入 Hive 表应该在其中一个数据节点上启动。
那么我应该在群集的每个数据节点上安装 Hive 吗? 或者我可以做任何配置来解决此问题?
- 确保 hive-site.xml 位于 hdfs 中。 将所有 hive libs 共享文件夹
- 复制到 sqoop libs 共享文件夹。