我是大数据世界的新手,hadoop我正在尝试在谷歌中运行一个可用的代码,它由四个步骤组成,例如将数据放入hadoop文件系统,然后为数据添加索引,然后主要步骤是使用map和reduce创建一个reduced数据。
我能够完成前两步:代码使用xml来处理位置:
我使用的代码是http://asterixdb.ics.uci.edu/fuzzyjoin/
当我做最后一步,即模糊连接时,它会给我一系列错误:
特此将跟踪文件附加到:
hduser@ubuntu:/home/midhu/fuzzyjoin$ cd fuzzyjoin-hadoop
hduser@ubuntu:/home/midhu/fuzzyjoin/fuzzyjoin-hadoop$ hadoop jar target/fuzzyjoin-hadoop-0.0.2-SNAPSHOT.jar fuzzyjoin -conf src/main/resources/fuzzyjoin/dblp.quickstart.xml
16/04/03 13:55:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Complete-Job started: Sun Apr 03 13:55:42 IST 2016
Multi-Job started: Sun Apr 03 13:55:42 IST 2016
FuzzyJoinDriver(TokensBasic.phase1)
Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/records-000}
Output Path: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000
Map Jobs: 2
Reduce Jobs: 1
Properties: {fuzzyjoin.similarity.name=Jaccard
fuzzyjoin.similarity.threshold=.5
fuzzyjoin.tokenizer=Word
fuzzyjoin.tokens.package=Scalar
fuzzyjoin.tokens.lengthstats=false
fuzzyjoin.ridpairs.group.class=TokenIdentity
fuzzyjoin.ridpairs.group.factor=1
fuzzyjoin.data.tokens=
fuzzyjoin.data.joinindex=}
Job started: Sun Apr 03 13:55:42 IST 2016
16/04/03 13:55:42 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/04/03 13:55:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/04/03 13:55:42 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:55:43 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/03 13:55:43 INFO mapreduce.JobSubmitter: number of splits:1
16/04/03 13:55:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1780986358_0001
16/04/03 13:55:44 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/03 13:55:44 INFO mapreduce.Job: Running job: job_local1780986358_0001
16/04/03 13:55:44 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/03 13:55:44 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/03 13:55:45 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/03 13:55:45 INFO mapred.LocalJobRunner: Starting task: attempt_local1780986358_0001_m_000000_0
16/04/03 13:55:46 INFO mapreduce.Job: Job job_local1780986358_0001 running in uber mode : false
16/04/03 13:55:46 INFO mapreduce.Job: map 0% reduce 0%
16/04/03 13:55:46 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:55:46 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687
16/04/03 13:55:46 INFO mapred.MapTask: numReduceTasks: 1
16/04/03 13:55:49 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/03 13:55:49 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/03 13:55:49 INFO mapred.MapTask: soft limit at 83886080
16/04/03 13:55:49 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/03 13:55:49 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/03 13:55:49 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/03 13:55:52 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 > map
16/04/03 13:55:54 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 > map
16/04/03 13:55:54 INFO mapred.MapTask: Starting flush of map output
16/04/03 13:55:54 INFO mapred.MapTask: Spilling map output
16/04/03 13:55:54 INFO mapred.MapTask: bufstart = 0; bufend = 15588; bufvoid = 104857600
16/04/03 13:55:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26209408(104837632); length = 4989/6553600
16/04/03 13:55:54 INFO mapred.MapTask: Finished spill 0
16/04/03 13:55:54 INFO mapred.Task: Task:attempt_local1780986358_0001_m_000000_0 is done. And is in the process of committing
16/04/03 13:55:54 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687
16/04/03 13:55:54 INFO mapred.Task: Task 'attempt_local1780986358_0001_m_000000_0' done.
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local1780986358_0001_m_000000_0
16/04/03 13:55:54 INFO mapred.LocalJobRunner: map task executor complete.
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Starting task: attempt_local1780986358_0001_r_000000_0
16/04/03 13:55:54 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:55:54 INFO mapreduce.Job: map 100% reduce 0%
16/04/03 13:55:54 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@3209e0
16/04/03 13:55:54 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
16/04/03 13:55:54 INFO reduce.EventFetcher: attempt_local1780986358_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
16/04/03 13:55:56 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1780986358_0001_m_000000_0 decomp: 9062 len: 9066 to MEMORY
16/04/03 13:55:56 INFO reduce.InMemoryMapOutput: Read 9062 bytes from map-output for attempt_local1780986358_0001_m_000000_0
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 9062, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->9062
16/04/03 13:55:57 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
16/04/03 13:55:57 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
16/04/03 13:55:57 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:55:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merged 1 segments, 9062 bytes to disk to satisfy reduce memory limit
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merging 1 files, 9066 bytes from disk
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/04/03 13:55:57 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:55:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:55:57 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:00 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:00 INFO mapreduce.Job: map 100% reduce 100%
16/04/03 13:56:01 INFO mapred.Task: Task:attempt_local1780986358_0001_r_000000_0 is done. And is in the process of committing
16/04/03 13:56:01 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:01 INFO mapred.Task: Task attempt_local1780986358_0001_r_000000_0 is allowed to commit now
16/04/03 13:56:02 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1780986358_0001_r_000000_0' to hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/_temporary/0/task_local1780986358_0001_r_000000
16/04/03 13:56:02 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:02 INFO mapred.Task: Task 'attempt_local1780986358_0001_r_000000_0' done.
16/04/03 13:56:02 INFO mapred.LocalJobRunner: Finishing task: attempt_local1780986358_0001_r_000000_0
16/04/03 13:56:02 INFO mapred.LocalJobRunner: reduce task executor complete.
16/04/03 13:56:02 INFO mapreduce.Job: Job job_local1780986358_0001 completed successfully
16/04/03 13:56:03 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=1080562
FILE: Number of bytes written=1589660
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=73374
HDFS: Number of bytes written=12847
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=18
Map-Reduce Framework
Map input records=100
Map output records=1248
Map output bytes=15588
Map output materialized bytes=9066
Input split bytes=120
Combine input records=1248
Combine output records=597
Reduce input groups=597
Reduce shuffle bytes=9066
Reduce input records=597
Reduce output records=597
Spilled Records=1194
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=176
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=241836032
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=36687
File Output Format Counters
Bytes Written=12847
Job ended: Sun Apr 03 13:56:04 IST 2016
The job took 21.44 seconds.
FuzzyJoinDriver(TokensBasic.phase2)
Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000}
Output Path: hdfs://localhost:54310/user/hduser/dblp-small/tokens-000
Map Jobs: 2
Reduce Jobs: 1
Properties: {fuzzyjoin.similarity.name=Jaccard
fuzzyjoin.similarity.threshold=.5
fuzzyjoin.tokenizer=Word
fuzzyjoin.tokens.package=Scalar
fuzzyjoin.tokens.lengthstats=false
fuzzyjoin.ridpairs.group.class=TokenIdentity
fuzzyjoin.ridpairs.group.factor=1
fuzzyjoin.data.tokens=
fuzzyjoin.data.joinindex=}
Job started: Sun Apr 03 13:56:04 IST 2016
16/04/03 13:56:04 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:04 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:05 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/03 13:56:05 INFO mapreduce.JobSubmitter: number of splits:1
16/04/03 13:56:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local954589393_0002
16/04/03 13:56:05 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/03 13:56:05 INFO mapreduce.Job: Running job: job_local954589393_0002
16/04/03 13:56:05 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/03 13:56:05 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/03 13:56:05 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/03 13:56:05 INFO mapred.LocalJobRunner: Starting task: attempt_local954589393_0002_m_000000_0
16/04/03 13:56:05 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:56:05 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/part-00000:0+12847
16/04/03 13:56:05 INFO mapred.MapTask: numReduceTasks: 1
16/04/03 13:56:06 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/03 13:56:06 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/03 13:56:06 INFO mapred.MapTask: soft limit at 83886080
16/04/03 13:56:06 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/03 13:56:06 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/03 13:56:06 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/03 13:56:06 INFO mapred.LocalJobRunner:
16/04/03 13:56:06 INFO mapred.MapTask: Starting flush of map output
16/04/03 13:56:06 INFO mapred.MapTask: Spilling map output
16/04/03 13:56:06 INFO mapred.MapTask: bufstart = 0; bufend = 7866; bufvoid = 104857600
16/04/03 13:56:06 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26212012(104848048); length = 2385/6553600
16/04/03 13:56:06 INFO mapred.MapTask: Finished spill 0
16/04/03 13:56:06 INFO mapred.Task: Task:attempt_local954589393_0002_m_000000_0 is done. And is in the process of committing
16/04/03 13:56:06 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/part-00000:0+12847
16/04/03 13:56:06 INFO mapred.Task: Task 'attempt_local954589393_0002_m_000000_0' done.
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local954589393_0002_m_000000_0
16/04/03 13:56:06 INFO mapred.LocalJobRunner: map task executor complete.
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Waiting for reduce tasks
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Starting task: attempt_local954589393_0002_r_000000_0
16/04/03 13:56:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:56:06 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@4950dd
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
16/04/03 13:56:06 INFO reduce.EventFetcher: attempt_local954589393_0002_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
16/04/03 13:56:06 INFO reduce.LocalFetcher: localfetcher#2 about to shuffle output of map attempt_local954589393_0002_m_000000_0 decomp: 9062 len: 9066 to MEMORY
16/04/03 13:56:06 INFO reduce.InMemoryMapOutput: Read 9062 bytes from map-output for attempt_local954589393_0002_m_000000_0
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 9062, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->9062
16/04/03 13:56:06 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
16/04/03 13:56:06 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:56:06 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merged 1 segments, 9062 bytes to disk to satisfy reduce memory limit
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merging 1 files, 9066 bytes from disk
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
16/04/03 13:56:06 INFO mapred.Merger: Merging 1 sorted segments
16/04/03 13:56:06 INFO mapreduce.Job: Job job_local954589393_0002 running in uber mode : false
16/04/03 13:56:06 INFO mapreduce.Job: map 100% reduce 0%
16/04/03 13:56:06 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:06 INFO mapred.Task: Task:attempt_local954589393_0002_r_000000_0 is done. And is in the process of committing
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1 / 1 copied.
16/04/03 13:56:06 INFO mapred.Task: Task attempt_local954589393_0002_r_000000_0 is allowed to commit now
16/04/03 13:56:06 INFO output.FileOutputCommitter: Saved output of task 'attempt_local954589393_0002_r_000000_0' to hdfs://localhost:54310/user/hduser/dblp-small/tokens-000/_temporary/0/task_local954589393_0002_r_000000
16/04/03 13:56:06 INFO mapred.LocalJobRunner: reduce > reduce
16/04/03 13:56:06 INFO mapred.Task: Task 'attempt_local954589393_0002_r_000000_0' done.
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local954589393_0002_r_000000_0
16/04/03 13:56:06 INFO mapred.LocalJobRunner: reduce task executor complete.
16/04/03 13:56:07 INFO mapreduce.Job: map 100% reduce 100%
16/04/03 13:56:07 INFO mapreduce.Job: Job job_local954589393_0002 completed successfully
16/04/03 13:56:07 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=2179300
FILE: Number of bytes written=3182466
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=99068
HDFS: Number of bytes written=31172
HDFS: Number of read operations=45
HDFS: Number of large read operations=0
HDFS: Number of write operations=30
Map-Reduce Framework
Map input records=597
Map output records=597
Map output bytes=7866
Map output materialized bytes=9066
Input split bytes=126
Combine input records=0
Combine output records=0
Reduce input groups=18
Reduce shuffle bytes=9066
Reduce input records=597
Reduce output records=597
Spilled Records=1194
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=488
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=336207872
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=12847
File Output Format Counters
Bytes Written=5478
Job ended: Sun Apr 03 13:56:07 IST 2016
The job took 3.563 seconds.
Multi-Job ended: Sun Apr 03 13:56:07 IST 2016
The multi-job took 25.128 seconds.
FuzzyJoinDriver(RIDPairsImproved)
Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/records-000}
Output Path: hdfs://localhost:54310/user/hduser/dblp-small/ridpairs-000
Map Jobs: 2
Reduce Jobs: 1
Properties: {fuzzyjoin.similarity.name=Jaccard
fuzzyjoin.similarity.threshold=.5
fuzzyjoin.tokenizer=Word
fuzzyjoin.tokens.package=Scalar
fuzzyjoin.tokens.lengthstats=false
fuzzyjoin.ridpairs.group.class=TokenIdentity
fuzzyjoin.ridpairs.group.factor=1
fuzzyjoin.data.tokens=dblp-small/tokens-000/part-00000
fuzzyjoin.data.joinindex=}
Job started: Sun Apr 03 13:56:08 IST 2016
16/04/03 13:56:08 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:08 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
16/04/03 13:56:09 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/03 13:56:09 INFO mapreduce.JobSubmitter: number of splits:1
16/04/03 13:56:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1951342027_0003
16/04/03 13:56:16 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/mapred/local/1459671970648/part-00000 <- /home/midhu/fuzzyjoin/fuzzyjoin-hadoop/part-00000
16/04/03 13:56:16 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:54310/user/hduser/dblp-small/tokens-000/part-00000 as file:/tmp/mapred/local/1459671970648/part-00000
16/04/03 13:56:17 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/04/03 13:56:17 INFO mapreduce.Job: Running job: job_local1951342027_0003
16/04/03 13:56:17 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/04/03 13:56:17 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/04/03 13:56:17 INFO mapred.LocalJobRunner: Waiting for map tasks
16/04/03 13:56:17 INFO mapred.LocalJobRunner: Starting task: attempt_local1951342027_0003_m_000000_0
16/04/03 13:56:17 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/04/03 13:56:17 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687
16/04/03 13:56:17 INFO mapred.MapTask: numReduceTasks: 1
16/04/03 13:56:17 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/04/03 13:56:17 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/04/03 13:56:17 INFO mapred.MapTask: soft limit at 83886080
16/04/03 13:56:17 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/04/03 13:56:17 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/04/03 13:56:17 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/04/03 13:56:17 INFO mapred.LocalJobRunner: map task executor complete.
16/04/03 13:56:17 WARN mapred.LocalJobRunner: job_local1951342027_0003
java.lang.Exception: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 10 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 15 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 18 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: file:/tmp/mapred/local/1459671970648/part-00000 (No such file or directory)
at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:60)
at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:40)
at edu.uci.ics.fuzzyjoin.hadoop.ridpairs.token.MapSelfJoin.configure(MapSelfJoin.java:98)
... 23 more
Caused by: java.io.FileNotFoundException: file:/tmp/mapred/local/1459671970648/part-00000 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:45)
... 25 more
16/04/03 13:56:18 INFO mapreduce.Job: Job job_local1951342027_0003 running in uber mode : false
16/04/03 13:56:18 INFO mapreduce.Job: map 0% reduce 0%
16/04/03 13:56:18 INFO mapreduce.Job: Job job_local1951342027_0003 failed with state FAILED due to: NA
16/04/03 13:56:18 INFO mapreduce.Job: Counters: 0
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoinDriver.run(FuzzyJoinDriver.java:179)
at edu.uci.ics.fuzzyjoin.hadoop.ridpairs.RIDPairsImproved.main(RIDPairsImproved.java:108)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoin.bib(FuzzyJoin.java:39)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoin.main(FuzzyJoin.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoinDriver.main(FuzzyJoinDriver.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
我认为这是ubuntu中hadoop的配置错误,我使用了本教程中的配置http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
最后我成功地运行了代码并更正了错误该错误是由于在机器中本地运行mapreduce程序引起的,我将其更改为在纱线中运行,代码对所有类型的数据都能正常工作