我想用Beautiful Soup来抓取一些信息,我把一个try
…except
在for循环中,但它似乎不是很有效。我一定是做错了什么,但我不知道在哪里。
这从一个名为occupations_list
的URL列表中获取html。URL示例:https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1
for occupation in occupations_list:
offers_page = requests.get(occupation)
offers_soup = BeautifulSoup(offers_page.content, 'lxml')
offers = offers_soup.find('ul', class_='result-list list-unstyled')
这在我上面的html中得到一个标题
for job in offers:
try:
headline = job.find('h2', class_='t4 media-heading').text
except Exception as e:
pass
print(headline)
问题是,在几个标题已经被刮掉之后,我得到了以下错误消息:
TypeError Traceback (most recent call last)
<ipython-input-77-cbf6b87ac0f9> in <module>()
3 offres_soup = BeautifulSoup(offres_page.content, 'lxml')
4 offres = offres_soup.find('ul', class_='result-list list-unstyled')
----> 5 for job in offres:
6 try:
7 headline = job.find('h2', class_='t4 media-heading').text
TypeError: 'NoneType' object is not iterable
None
表示没有发现,您可以使用if
…is None
检查而不是尝试—除非没有发现,否则跳过,如下所示
for occupation in occupations_list:
offers_page = requests.get(occupation)
offers_soup = BeautifulSoup(offers_page.content, 'lxml')
offers = offers_soup.find('ul', class_='result-list list-unstyled')
if offers is None:
continue
print("Processing offers")
将print("Processing offers")
替换为实际处理