以某种自定义方式使用concurrent.futures时,无法打印函数的结果



我已经使用concurrent.futures库创建了一个脚本来打印fetch_links函数的结果。当我在函数中使用print语句时,我会得到相应的结果。我现在想做的是使用yield语句打印该函数的结果。

有没有什么方法可以修改main函数下的内容,以便打印fetch_links函数的结果,保持原样,也就是说保持yield语句?

import requests
from bs4 import BeautifulSoup
import concurrent.futures as cf
links = [
"https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50",
"https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50",
"https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=4&pagesize=50"
]
base = 'https://stackoverflow.com{}'
def fetch_links(s,link):
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
for item in soup.select(".summary .question-hyperlink"):
# print(base.format(item.get("href")))
yield base.format(item.get("href"))
if __name__ == '__main__':
with requests.Session() as s:
with cf.ThreadPoolExecutor(max_workers=5) as exe:
future_to_url = {exe.submit(fetch_links,s,url): url for url in links}
cf.as_completed(future_to_url)

你的fetch_links是一个生成器,所以你也必须循环它,才能得到结果:

import requests
from bs4 import BeautifulSoup
import concurrent.futures as cf
links = [
"https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=2&pagesize=50",
"https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=3&pagesize=50",
"https://stackoverflow.com/questions/tagged/web-scraping?tab=newest&page=4&pagesize=50"
]
base = 'https://stackoverflow.com{}'

def fetch_links(s, link):
r = s.get(link)
soup = BeautifulSoup(r.text, "lxml")
for item in soup.select(".summary .question-hyperlink"):
yield base.format(item.get("href"))

if __name__ == '__main__':
with requests.Session() as s:
with cf.ThreadPoolExecutor(max_workers=5) as exe:
future_to_url = {exe.submit(fetch_links, s, url): url for url in links}
for future in cf.as_completed(future_to_url):
for result in future.result():
print(result)

输出:

https://stackoverflow.com/questions/64298886/rvest-webscraping-in-r-with-form-inputs
https://stackoverflow.com/questions/64298879/is-this-site-not-suited-for-web-scraping-using-beautifulsoup
https://stackoverflow.com/questions/64297907/python-3-extract-html-data-from-sports-site
https://stackoverflow.com/questions/64297728/cant-get-the-fully-loaded-html-for-a-page-using-puppeteer
https://stackoverflow.com/questions/64296859/scrape-text-from-a-span-tag-containing-nested-span-tag-in-beautifulsoup
https://stackoverflow.com/questions/64296656/scrapy-nameerror-name-items-is-not-defined
https://stackoverflow.com/questions/64296201/missing-values-while-scraping-using-beautifulsoup-in-python
https://stackoverflow.com/questions/64296130/how-can-i-identify-the-element-containing-the-link-to-my-linkedin-profile-after
https://stackoverflow.com/questions/64295959/why-use-scrapy-or-beautifulsoup-vs-just-parsing-html-with-regex-v2
https://stackoverflow.com/questions/64295842/how-to-retreive-scrapping-data-from-web-to-json-like-format
https://stackoverflow.com/questions/64295559/how-to-iterate-through-a-supermarket-website-and-getting-the-product-name-and-pr
https://stackoverflow.com/questions/64295509/cant-stop-asyncio-request-for-some-delay
https://stackoverflow.com/questions/64295244/paginate-with-network-requests-scraper
and so on ...

最新更新