我编写了一个用于从Web获取链接的代码。运行此代码大约需要2:20分钟,因为它只是代码中的功能。我想提高效率。我考虑了多线程,但是我很难深入了解它,并将其应用于此代码
def get_manufacturer():
manufacturers = requests.get("https://www.gsmarena.com/")
res = re.findall(r"<li><a href="samsung-phones-9.php">.+n", manufacturers.text)
manufacturer_links = re.findall(r"<li><a href="(.+?)">", res[0])
final_list = []
for i in range(len(manufacturer_links)):
final_list.append("https://www.gsmarena.com/" + manufacturer_links[i])
# find pages
for i in final_list:
req = requests.get(i)
res2 = re.findall(r"<strong>1</strong>(.+)</div>", req.text)
for k in res2:
if k is not None:
pages = re.findall(r"<a href="(.+?)">.</a>", res2[0])
for j in range(len(pages)):
final_list.append("https://www.gsmarena.com/" + pages[j])
return final_list
您可以在下面并行运行循环循环
import multiprocessing as mul
def calcIntOfnth(i,ppStr,c,znot):
pool = mul.Pool(mul.cpu_count())
results = pool.starmap(calcIntOfnth, [(i,ppStr,c,znot) for i in range(k)]) # other parameters are local to this statement i.e. ppStr,c,znot,k
pool.close()
您需要将您的for循环之一重写为函数,并使用Pool
对象或其他类似方式并行运行它。