多个请求导致程序崩溃(使用BeautifulSoup)



我正在用python编写一个程序,让用户输入多个网站,然后请求并抓取这些网站的标题并输出。然而,当程序超过8个网站时,程序每次都会崩溃。我不确定这是否是记忆问题,但我一直在找,找不到有同样问题的人。代码如下(我添加了9个列表,所以您只需复制并粘贴代码即可查看问题(。

import requests
from bs4 import BeautifulSoup
lst = ['https://covid19tracker.ca/provincevac.html?p=ON', 'https://www.ontario.ca/page/reopening-ontario#foot-1', 'https://blog.twitter.com/en_us/topics/company/2020/keeping-our-employees-and-partners-safe-during-coronavirus.html', 'https://www.aboutamazon.com/news/company-news/amazons-covid-19-blog-updates-on-how-were-responding-to-the-crisis#covid-latest', 'https://www.bcg.com/en-us/publications/2021/advantages-of-remote-work-flexibility', 'https://news.prudential.com/increasingly-workers-expect-pandemic-workplace-adaptations-to-stick.htm', 'https://www.mckinsey.com/featured-insights/future-of-work/whats-next-for-remote-work-an-analysis-of-2000-tasks-800-jobs-and-nine-countries', 'https://www.gsb.stanford.edu/faculty-research/publications/does-working-home-work-evidence-chinese-experiment', 'https://www.livecareer.com/resources/careers/planning/is-remote-work-here-to-stay']
for websites in range(len(lst)):
url=lst[websites]
cite = requests.get(url,timeout=10).content
soup = BeautifulSoup(cite,'html.parser')
title = soup.find('title').get_text().strip()
print(title)
print("Didn't crash")

第二个网站没有标题,但不要担心

为了避免页面崩溃,请在requests.get()中的headers=参数中添加user-agent标头,否则,页面会认为您的机器人程序会阻止您。

cite = requests.get(url, headers=headers, timeout=10).content

在您的情况下:

import requests
from bs4 import BeautifulSoup

headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36"
}
lst = [
"https://covid19tracker.ca/provincevac.html?p=ON",
"https://www.ontario.ca/page/reopening-ontario#foot-1",
"https://blog.twitter.com/en_us/topics/company/2020/keeping-our-employees-and-partners-safe-during-coronavirus.html",
"https://www.aboutamazon.com/news/company-news/amazons-covid-19-blog-updates-on-how-were-responding-to-the-crisis#covid-latest",
"https://www.bcg.com/en-us/publications/2021/advantages-of-remote-work-flexibility",
"https://news.prudential.com/increasingly-workers-expect-pandemic-workplace-adaptations-to-stick.htm",
"https://www.mckinsey.com/featured-insights/future-of-work/whats-next-for-remote-work-an-analysis-of-2000-tasks-800-jobs-and-nine-countries",
"https://www.gsb.stanford.edu/faculty-research/publications/does-working-home-work-evidence-chinese-experiment",
"https://www.livecareer.com/resources/careers/planning/is-remote-work-here-to-stay",
]
for websites in range(len(lst)):
url = lst[websites]
cite = requests.get(url, headers=headers, timeout=10).content
soup = BeautifulSoup(cite, "html.parser")
title = soup.find("title").get_text().strip()
print(title)
print("Didn't crash")

最新更新