我的目标是使用Selenium Driver Package获得各种网页的源代码。要在打开页面时使用空闲时间,我希望使用多处理。然而,由于我是多处理的新手,所以我无法使代码正常工作。
这是一个简单函数,它是我希望并行运行的示例(需要selenium网络驱动程序包和时间包(:
def get_source(links):
for i in range(len(links)):
time.wait(3)
driver.get(links[i])
time.wait(3)
print(driver.page_source)
time.wait(3)
print("Done with the page")
不同的网页被输入该功能,例如:
links = ["https://stackoverflow.com/questions/tagged/javascript","https://stackoverflow.com/questions/tagged/python","https://stackoverflow.com/questions/tagged/c%23","https://stackoverflow.com/questions/tagged/php"]
这就是我到目前为止所拥有的。然而,不幸的是,它只对网络驱动程序的实例进行垃圾邮件处理,而不是执行它想要执行的操作。
if __name__ == '__main__':
pool = Pool(2)
pool.map(get_source(), links)
非常感谢您的帮助!非常感谢!
使用multiprocessing.pool
时,使用apply_async
方法将函数映射到参数列表。请注意,由于函数是异步运行的,因此应该向函数传递某种索引,并将其与结果一起返回。在这种情况下,函数会返回URL和页面源。
试试这个代码:
import multiprocessing as mp
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
def get_source(link): # single URL
time.sleep(3)
driver.get(link)
time.sleep(3)
print("Done with the page:", link)
return (link, driver.page_source) # return tuple: link & source
links = [
"https://stackoverflow.com/questions/tagged/javascript",
"https://stackoverflow.com/questions/tagged/python",
"https://stackoverflow.com/questions/tagged/c%23",
"https://stackoverflow.com/questions/tagged/php"
]
if __name__ == '__main__':
pool = mp.Pool(processes=2)
results = [pool.apply_async(get_source, args=(lnk,)) for lnk in links] # maps function to iterator
output = [p.get() for p in results] # collects and returns the results
for r in output:
print("len =", len(r[1]), "for link", r[0]) # read tuple elements
输出
Done with the page: https://stackoverflow.com/questions/tagged/python
Done with the page: https://stackoverflow.com/questions/tagged/javascript
Done with the page: https://stackoverflow.com/questions/tagged/c%23
Done with the page: https://stackoverflow.com/questions/tagged/php
len = 163045 for link https://stackoverflow.com/questions/tagged/javascript
len = 161512 for link https://stackoverflow.com/questions/tagged/python
len = 192744 for link https://stackoverflow.com/questions/tagged/c%23
len = 192678 for link https://stackoverflow.com/questions/tagged/php