我有以下代码,运行良好:
from multiprocessing.pool import ThreadPool
from multiprocessing import Pool
import requests
# Retrieve data using get request
def get_docs():
try:
docs = requests.get(url).json()['data']
return docs
except ValueError:
print("ValueError")
# Index dictionary for each attribute
def get_attributes(doc):
x1, x2, x3, x4 = doc["id"], doc["created_utc"], doc["title"], doc["subreddit"]
return (x1, x2, x3, x4)
# Map each document to the attribute function
def get_data(docs):
with ThreadPool(4) as pool:
results = pool.map(get_attributes, docs)
return results
docs = get_docs()
data = get_data(docs)
print(data)
但我真正想做的是让get_attributes((看起来像这样:
def get_attributes(doc):
"""
Using either Pool or ThreadPool
"""
with Pool(4) as p:
results = p.map(some_function(doc), ["id", "created_utc", "title", "subreddit"])
return results
# Where the get_attributes function iteratively maps attributes to one document:
def some_function(doc, arg):
return doc[arg]
# And then ultimately this should work
def get_data(docs):
with ThreadPool(4) as pool:
results = pool.map(get_attributes, docs)
return results
根据是Pool还是ThreadPool用于get_attributes,我会得到不同的错误,这与如何使用multiprocessing
存储/访问内存有关。
但我希望它可以用*argv
或类似的东西来解决。
所以这是有效的:
def get_attributes(doc):
with ThreadPool(2) as p:
results = p.map(lambda x: doc[x], ["id", "created_utc", "title", "subreddit"])
return results
# Map each document to the attribute function
def get_data(docs):
with ThreadPool(2) as pool:
results = pool.map(get_attributes, docs)
return results
但只有使用ThreadPool
,使用Pool
会给我一个pickle错误。我对这个结果很满意,但如果有人能提供进一步优化的技巧,或者解释什么是泡菜,以及为什么我必须使用ThreadPool
和Pool
,我会很高兴听到的!