多访问.pool.ThreadPool的内存访问警告,或者可以使用变量参数修复



我有以下代码,运行良好:

from multiprocessing.pool import ThreadPool
from multiprocessing import Pool
import requests

# Retrieve data using get request
def get_docs():
try:
docs = requests.get(url).json()['data']
return docs
except ValueError:
print("ValueError")
# Index dictionary for each attribute
def get_attributes(doc):
x1, x2, x3, x4 = doc["id"], doc["created_utc"], doc["title"], doc["subreddit"]
return (x1, x2, x3, x4)
# Map each document to the attribute function
def get_data(docs):
with ThreadPool(4) as pool:
results = pool.map(get_attributes, docs)
return results

docs = get_docs()
data = get_data(docs)
print(data)

但我真正想做的是让get_attributes((看起来像这样:

def get_attributes(doc):
"""
Using either Pool or ThreadPool
"""
with Pool(4) as p:
results = p.map(some_function(doc), ["id", "created_utc", "title", "subreddit"])
return results

# Where the get_attributes function iteratively maps attributes to one document:
def some_function(doc, arg):
return doc[arg]

# And then ultimately this should work
def get_data(docs):
with ThreadPool(4) as pool:
results = pool.map(get_attributes, docs)
return results

根据是Pool还是ThreadPool用于get_attributes,我会得到不同的错误,这与如何使用multiprocessing存储/访问内存有关。

但我希望它可以用*argv或类似的东西来解决。

所以这是有效的:

def get_attributes(doc):
with ThreadPool(2) as p:
results = p.map(lambda x: doc[x], ["id", "created_utc", "title", "subreddit"])
return results
# Map each document to the attribute function
def get_data(docs):
with ThreadPool(2) as pool:
results = pool.map(get_attributes, docs)
return results

但只有使用ThreadPool,使用Pool会给我一个pickle错误。我对这个结果很满意,但如果有人能提供进一步优化的技巧,或者解释什么是泡菜,以及为什么我必须使用ThreadPoolPool,我会很高兴听到的!

相关内容

最新更新