GAE MapReduce ShufflePipeline causes TaskTooLargeError



我正在尝试在数据存储上运行一个标准的MapReduce作业。Map Pipeline运行正常,但随后作业被卡在ShufflePipeline中。我得到了大约8个这样的错误日志:

2013-05-13 08:26:18.154 /mapreduce/kickoffjob_callback 500 19978ms 2kb AppEngine-Google
0.1.0.2 - - [13/May/2013:08:26:18 -0700] 
"POST /mapreduce/kickoffjob_callback HTTP/1.1" 500 2511 
"http://x.appspot.com/mapreduce/pipeline/run" "AppEngine-Google;  
"x" ms=19979 cpu_ms=9814 cpm_usd=0.000281 queue_name=default  
task_name=15467899496029413827 app_engine_release=1.8.0  
instance=00c61b117c2368b09b3a28374853f2e040692c68

E 2013-05-13 08:26:18.055
Task size must be less than 102400; found 105564
Traceback (most recent call last):
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1536, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1530, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/data/home/apps/x/1.367342714947958888/webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/base_handler.py", line 65, in post
    self.handle()
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/handlers.py", line 692, in handle
    spec, input_readers, output_writers, queue_name, self.base_path())
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/handlers.py", line 767, in _schedule_shards
    queue_name=queue_name)
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/handlers.py", line 369, in _schedule_slice
    worker_task.add(queue_name, parent=shard_state)
  File "/base/data/home/apps/x/1.367342714947958888/mapreduce/util.py", line 265, in add
    countdown=self.countdown)
  File "/python27_runtime/python27_lib/versions/1/google/appengine/api/taskqueue/taskqueue.py", line 769, in __init__
    (max_task_size_bytes, self.size))
TaskTooLargeError: Task size must be less than 102400; found 105564

我该如何解决这个问题?这似乎是由MR库的内部工作以及它如何分解任务引起的问题。如果是这样,我该如何解决这个问题?

这是一个bug。它被固定在这里:https://code.google.com/p/appengine-mapreduce/source/detail?r=453

相关内容

  • 没有找到相关文章

最新更新