Python多处理种族条件

我在使用 concurrent.futures从多个文本文件读取时发现了一个奇怪的错误。

这是一个小的可重现示例：

import os
import concurrent.futures
def read_file(file):
    with open(os.path.join(data_dir, file),buffering=1000) as f:
        for row in f:
            try:
                print(row)
            except Exception as e:
                print(str(e))
if __name__ == '__main__':
    data_dir = os.path.abspath(os.path.join(os.path.dirname(__file__), 'data'))
    files = ['file1', 'file2']
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for file,_ in zip(files,executor.map(read_file,files)):
            pass

file1和file2是data目录中的任意文本文件。

我会遇到以下错误（基本上是一个过程试图在分配之前读取data_dir变量）：

concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:Usersmy_usernameAppDataLocalContinuumAnaconda3libconcurrentfuturesprocess.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "C:Usersmy_usernameAppDataLocalContinuumAnaconda3libconcurrentfuturesprocess.py", line 153, in _process_chunk
    return [fn(*args) for args in chunk]
  File "C:Usersmy_usernameAppDataLocalContinuumAnaconda3libconcurrentfuturesprocess.py", line 153, in <listcomp>
    return [fn(*args) for args in chunk]
  File "C:Usersmy_usernameDownloadsexample.py", line 5, in read_file
    with open(os.path.join(data_dir, file),buffering=1000) as f:
NameError: name 'data_dir' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "example.py", line 16, in <module>
    for file,_ in zip(files,executor.map(read_file,files)):
  File "C:Usersmy_usernameAppDataLocalContinuumAnaconda3libconcurrentfutures_base.py", line 556, in result_iterator
    yield future.result()
  File "C:Usersmy_usernameAppDataLocalContinuumAnaconda3libconcurrentfutures_base.py", line 405, in result
    return self.__get_result()
  File "C:Usersmy_usernameAppDataLocalContinuumAnaconda3libconcurrentfutures_base.py", line 357, in __get_result
    raise self._exception
NameError: name 'data_dir' is not defined

如果我将data_dir分配放置在if __name__ == '__main__':块之前，则不会遇到此错误，并且代码按预期执行。

是什么导致此错误？显然，在两种情况下都应进行任何异步调用之前分配data_dir。

ProcessPoolExecutor spaws spaws新的python process ，导入正确的模块并调用您提供的功能。由于data_dir仅在您 run 的模块时才定义，而不是当您 import IT时，要期待错误。

将data_dir文件描述符作为read_file 可能的参数，因为我相信该过程继承了父母的文件描述符。不过，您需要检查。

如果要使用ThreadPoolExecutor，则您的示例应起作用，因为产卵线程共享内存。

fork()在Windows上不可用，因此Python使用spawn启动新过程，该过程将启动新的Python解释器过程，不会共享内存，但是Python会尝试重新创建Worker Worker功能环境在新过程中，这就是模块级变量工作的原因。有关更多详细信息，请参见DOC。

相关内容

最新更新

热门标签：