为什么我在多处理时收到递归错误?



我希望使用多处理对大量地址进行地理编码。我有以下代码:

import multiprocessing
import geocoder
addresses = ['New York City, NY','Austin, TX', 'Los Angeles, CA', 'Boston, MA'] # and on and on
def geocode_worker(address):
return geocoder.arcgis(address)
def main_process():
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
return pool.map(geocode_worker, addresses)
if __name__ == '__main__':
main_process()

但它给了我这个错误:

Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/anaconda3/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/anaconda3/lib/python3.7/multiprocessing/pool.py", line 470, in _handle_results
task = get()
File "/opt/anaconda3/lib/python3.7/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/opt/anaconda3/lib/python3.7/site-packages/geocoder/base.py", line 599, in __getattr__
if not self.ok:
File "/opt/anaconda3/lib/python3.7/site-packages/geocoder/base.py", line 536, in ok
return len(self) > 0
File "/opt/anaconda3/lib/python3.7/site-packages/geocoder/base.py", line 422, in __len__
return len(self._list)

错误的最后 3 行一遍又一遍地重复,然后回溯的最后一行是:

RecursionError: maximum recursion depth exceeded while calling a Python object

谁能帮我弄清楚为什么?

问题是geocoder返回的ArcgisQuery对象不是可拾取的 - 或者更确切地说,它不是不可拾取的。解酸菜过程由于使用了__getattr__而达到无限循环,self.ok内部试图访问,最终依赖于要定义的self._list,在解酸时没有定义,因为它只在__init__中定义,而__init__在解酸时不被调用。由于它未定义,因此它会尝试使用__getattr__来查找它,这会尝试再次访问self.ok,并创建无限循环。

您可以通过不在工作进程和主进程之间传递ArcgisQuery对象本身来解决此问题,而只传递其基础__dict__。然后,在主进程中重新生成ArcgisQuery对象:

import multiprocessing
import geocoder
from geocoder.arcgis import ArcgisQuery
addresses = ['New York City, NY','Austin, TX', 'Los Angeles, CA', 'Boston, MA'] # and on and on
def geocode_worker(address):
out = geocoder.arcgis(address)
return out.__dict__ # Only return the object's __dict__
def main_process():
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count())
l = pool.map(geocode_worker, addresses)
out = []
for d in l:
q = ArcgisQuery(d['location'])  # location is a required constructor arg
q.__dict__.update(d)  # Load the rest of our state into the new object
out.append(q)
return out
if __name__ == '__main__':
print(main_process())

如果您实际上不需要整个ArcgisQuery对象,并且只需要它的某些部分,您也可以从工作进程中返回这些对象,以避免需要此黑客攻击。

就其价值而言,看起来geocoder可以通过在 ArcgisQuery 或其基类上实现__getstate____setstate__来解决其酸洗问题,如下所示:

def __getstate__(self):
return self.__dict__
def __setstate__(self, state):
self.__dict__.update(state)

相关内容

  • 没有找到相关文章

最新更新