MRJob:socket.error:[Erno 104]对等方重置连接



简而言之:使用MRJob时出现"socket.error:[Erno 104]对等连接重置"异常。该脚本实际上可以访问S3,因为它确实创建了bucket并上传了一些小文件(我已经通过AWS控制台手动检查过了)。但最大的文件INPUT没有上传。嘿,这只是7GB的测试数据!

尝试了4次,总是出错。

mrjob==0.4.2

配置

# cat /etc/mrjob.conf 
runners:
  inline:
    base_tmp_dir: /home/tmp
  emr:
    base_tmp_dir: /home/tmp
    aws_access_key_id: [VALID KEY HERE]
    aws_secret_access_key: [VALID SECRET HERE]
    aws_region: us-east-1
    ec2_instance_type: m1.medium
    num_ec2_instances: 7

回溯

# python /home/bigdata/mr_job_1.py -r emr  /home/filesystem/INPUT > /home/filesystem/OUTPUT
using configs in /etc/mrjob.conf
creating new scratch bucket mrjob-f02b7cd37b2bfffd
using s3://mrjob-f02b7cd37b2bfffd/tmp/ as our scratch dir on S3
creating tmp directory /home/tmp/mr_job_1.root.20131216.152251.298419
writing master bootstrap script to /home/tmp/mr_job_1.root.20131216.152251.298419/b.py
creating S3 bucket 'mrjob-f02b7cd37b2bfffd' to use as scratch space
Copying non-input files into s3://mrjob-f02b7cd37b2bfffd/tmp/mr_job_1.root.20131216.152251.298419/files/
Traceback (most recent call last):
  File "/home/bigdata/workers/process_data/mr_job_1.py", line 178, in <module>
    MRSwapData().run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 494, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 512, in execute
    super(MRJob, self).execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 147, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 208, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 458, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 806, in _run
    self._prepare_for_launch()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 817, in _prepare_for_launch
    self._upload_local_files_to_s3()
  File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 905, in _upload_local_files_to_s3
    s3_key.set_contents_from_filename(path)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 1290, in set_contents_from_filename
    encrypt_key=encrypt_key)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 1221, in set_contents_from_file
    chunked_transfer=chunked_transfer, size=size)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 713, in send_file
    chunked_transfer=chunked_transfer, size=size)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/key.py", line 889, in _send_file_internal
    query_args=query_args
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 547, in make_request
    retry_handler=retry_handler
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 947, in make_request
    retry_handler=retry_handler)
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 908, in _mexe
    raise e
socket.error: [Errno 104] Connection reset by peer

我也遇到了这个问题。我仔细查看了一下代码,发现boto引发了一个PleaseRootyException或self.http_exceptions类型的异常。由于它不仅仅是一种类型的异常,而且我不想将它们导入到我的代码中,所以我改为这样做:

should_try_again = True
while should_try_again:
  try:
    method_that_calls_boto()
    should_try_again = False
  except Exception as e:
    print('Exception: %s' % e)
    time.sleep(5)

最新更新