sqoop export --update-key from hdfs to mssql



我需要使用 update-key 进行 sqoop 导出以更新 mircosoft sql 中的表。

我有一个可以成功导出到 mssql 表的配置单元表:

sqoop export --connect jdbc:sqlserver://{some.ip.address};database={somedatabase} / --username 'someuser' /
--password-file '/some/password/file' /
--table 'Sometable' /
--columns ID,value1,value2 /
--export-dir /apps/hive/warehouse/some.db/Sometable /
--input-fields-terminated-by "||" / -m 2 /
/user/oozie/share/lib/sqoop/sqljdbc4.jar

但是,我希望更新密钥并运行:

sqoop export --connect jdbc:sqlserver://{some.ip.address};database={somedatabase} /
    --username 'someuser' /
    --password-file '/some/password/file' /
    --table 'Sometable' /
    --columns ID,value1,value2 /
    --export-dir /apps/hive/warehouse/some.db/Sometable /
    --input-fields-terminated-by "||" /
    --update-key ID /
    --update-mode allowinsert /
    -m 2 /
    /user/oozie/share/lib/sqoop/sqljdbc4.jar

日志非常没有帮助,(注意:sqoop是通过oozie作业运行的(:

...
5972 [main] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl  - Submitted application application_1485423751090_3566
6016 [main] INFO  org.apache.hadoop.mapreduce.Job  - The url to track the job: http://had003.headquarters.7layer.net:8088/proxy/application_1485423751090_3566/
6016 [main] INFO  org.apache.hadoop.mapreduce.Job  - The url to track the job: http://had003.headquarters.7layer.net:8088/proxy/application_1485423751090_3566/
6017 [main] INFO  org.apache.hadoop.mapreduce.Job  - Running job: job_1485423751090_3566
6017 [main] INFO  org.apache.hadoop.mapreduce.Job  - Running job: job_1485423751090_3566
20284 [main] INFO  org.apache.hadoop.mapreduce.Job  - Job job_1485423751090_3566 running in uber mode : false
20284 [main] INFO  org.apache.hadoop.mapreduce.Job  - Job job_1485423751090_3566 running in uber mode : false
20287 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 0% reduce 0%
20287 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 0% reduce 0%
27001 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 50% reduce 0%
27001 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 50% reduce 0%
Heart beat
37117 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 100% reduce 0%
37117 [main] INFO  org.apache.hadoop.mapreduce.Job  -  map 100% reduce 0%
38139 [main] INFO  org.apache.hadoop.mapreduce.Job  - Job job_1485423751090_3566 failed with state FAILED due to: Task failed task_1485423751090_3566_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

38139 [main] INFO  org.apache.hadoop.mapreduce.Job  - Job job_1485423751090_3566 failed with state FAILED due to: Task failed task_1485423751090_3566_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

38292 [main] INFO  org.apache.hadoop.mapreduce.Job  - Counters: 32
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=338177
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=166
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Failed map tasks=1
        Launched map tasks=2
        Other local map tasks=1
        Rack-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=16369
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=16369
        Total vcore-milliseconds taken by all map tasks=16369
        Total megabyte-milliseconds taken by all map tasks=25142784
    Map-Reduce Framework
        Map input records=0
        Map output records=0
        Input split bytes=156
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=79
        CPU time spent (ms)=960
        Physical memory (bytes) snapshot=230920192
        Virtual memory (bytes) snapshot=3235606528
        Total committed heap usage (bytes)=162529280
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
38292 [main] INFO  org.apache.hadoop.mapreduce.Job  - Counters: 32
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=338177
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=166
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Failed map tasks=1
        Launched map tasks=2
        Other local map tasks=1
        Rack-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=16369
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=16369
        Total vcore-milliseconds taken by all map tasks=16369
        Total megabyte-milliseconds taken by all map tasks=25142784
    Map-Reduce Framework
        Map input records=0
        Map output records=0
        Input split bytes=156
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=79
        CPU time spent (ms)=960
        Physical memory (bytes) snapshot=230920192
        Virtual memory (bytes) snapshot=3235606528
        Total committed heap usage (bytes)=162529280
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
38319 [main] INFO  org.apache.sqoop.mapreduce.ExportJobBase  - Transferred 166 bytes in 34.0574 seconds (4.8741 bytes/sec)
38319 [main] INFO  org.apache.sqoop.mapreduce.ExportJobBase  - Transferred 166 bytes in 34.0574 seconds (4.8741 bytes/sec)
38332 [main] INFO  org.apache.sqoop.mapreduce.ExportJobBase  - Exported 0 records.
38332 [main] INFO  org.apache.sqoop.mapreduce.ExportJobBase  - Exported 0 records.
38332 [main] ERROR org.apache.sqoop.mapreduce.ExportJobBase  - Export job failed!
38332 [main] ERROR org.apache.sqoop.mapreduce.ExportJobBase  - Export job failed!
38333 [main] ERROR org.apache.sqoop.tool.ExportTool  - Error during export: Export job failed!
38333 [main] ERROR org.apache.sqoop.tool.ExportTool  - Error during export: Export job failed!

<<< Invocation of Sqoop command completed <<<

 Hadoop Job IDs executed by Sqoop: job_1485423751090_3566

Intercepting System.exit(1)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Oozie Launcher failed, finishing Hadoop job gracefully

Oozie Launcher, uploading action data to HDFS sequence file: hdfs://{Something}
38406 [main] INFO  org.apache.hadoop.io.compress.zlib.ZlibFactory  - Successfully loaded & initialized native-zlib library
38407 [main] INFO  org.apache.hadoop.io.compress.CodecPool  - Got brand-new compressor [.deflate]

Oozie Launcher ends

38538 [main] INFO  org.apache.hadoop.mapred.Task  - Task:attempt_1485423751090_3565_m_000000_0 is done. And is in the process of committing
38538 [main] INFO  org.apache.hadoop.mapred.Task  - Task:attempt_1485423751090_3565_m_000000_0 is done. And is in the process of committing
38601 [main] INFO  org.apache.hadoop.mapred.Task  - Task attempt_1485423751090_3565_m_000000_0 is allowed to commit now
38601 [main] INFO  org.apache.hadoop.mapred.Task  - Task attempt_1485423751090_3565_m_000000_0 is allowed to commit now
38641 [main] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter  - Saved output of task 'attempt_1485423751090_3565_m_000000_0' to hdfs://{Something}
38692 [main] INFO  org.apache.hadoop.mapred.Task  - Task 'attempt_1485423751090_3565_m_000000_0' done.
38692 [main] INFO  org.apache.hadoop.mapred.Task  - Task 'attempt_1485423751090_3565_m_000000_0' done.

有谁知道为什么我不能通过插入更新到 mssql?不支持吗?我做错了什么?

--update-mode allowinsert可能是这里的罪魁祸首。

根据文档:

根据目标数据库的不同,如果要更新数据库中已存在的行或插入行(如果它们尚不存在(,则还可以使用 allowinsert 模式指定 --update-mode 参数。

为了实现这一目标,Sqoop正在使用

 INSERT INTO … ON DUPLICATE KEY UPDATE

因此,并非所有 RDBMS 都支持此更新插入功能(此类查询(。

查看有关 sqoop 中更新插入的更多信息。

最新更新