我正在尝试在数据存储上运行一个标准的MapReduce作业。Map Pipeline运行正常,但随后作业被卡在ShufflePipeline中。我得到了大约8个这样的错误日志:
2013-05-13 08:26:18.154 /mapreduce/kickoffjob_callback 500 19978ms 2kb AppEngine-Google
0.1.0.2 - - [13/May/2013:08:26:18 -0700]
"POST /mapreduce/kickoffjob_callback HTTP/1.1" 500 2511
"http://x.appspot.com/mapreduce/pipeline/run" "AppEngine-Google;
"x" ms=19979 cpu_ms=9814 cpm_usd=0.000281 queue_name=default
task_name=15467899496029413827 app_engine_release=1.8.0
instance=00c61b117c2368b09b3a28374853f2e040692c68
E 2013-05-13 08:26:18.055
Task size must be less than 102400; found 105564
Traceback (most recent call last):
File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1536, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1530, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/x/1.367342714947958888/mapreduce/base_handler.py", line 65, in post
self.handle()
File "/base/data/home/apps/x/1.367342714947958888/mapreduce/handlers.py", line 692, in handle
spec, input_readers, output_writers, queue_name, self.base_path())
File "/base/data/home/apps/x/1.367342714947958888/mapreduce/handlers.py", line 767, in _schedule_shards
queue_name=queue_name)
File "/base/data/home/apps/x/1.367342714947958888/mapreduce/handlers.py", line 369, in _schedule_slice
worker_task.add(queue_name, parent=shard_state)
File "/base/data/home/apps/x/1.367342714947958888/mapreduce/util.py", line 265, in add
countdown=self.countdown)
File "/python27_runtime/python27_lib/versions/1/google/appengine/api/taskqueue/taskqueue.py", line 769, in __init__
(max_task_size_bytes, self.size))
TaskTooLargeError: Task size must be less than 102400; found 105564
我该如何解决这个问题?这似乎是由MR库的内部工作以及它如何分解任务引起的问题。如果是这样,我该如何解决这个问题?
这是一个bug。它被固定在这里:https://code.google.com/p/appengine-mapreduce/source/detail?r=453