使用BeautifulSoup解析HTML,选择()



我正在尝试使用BeautifulSoup获取最新的帖子内容
有时标签在最近的帖子中,有时则不然
如果标签在那里,我想得到它,如果不在那里,只需要得到其他文本
我的代码如下。

import requests
from bs4 import BeautifulSoup
headers = 'User-Agent':'Mozilla/5.0'
url = "https:// " 
req = requests.get(url, headers=headers)
html = req.text       
soup = BeautifulSoup(html, 'html.parser')                
link = soup.select('#flagList > div.clear.ab-webzine > div > a')       
title = soup.select('#flagList > div.clear.ab-webzine > div > div.wz-item-header > a > span')         
latest_link = link[0] # link of latest post    
latest_title = title[0].text # title of latest post
# to get the text of latest post
t_url = latest_link
t_req = requests.get(t_url, headers=headers)
t_html = c_res.text
t_soup = BeautifulSoup(t_html, 'html.parser')  
maintext = t_soup.select ('#flagArticle > div.rhymix_content.xe_content')
tag = t_soup.select_one('div.rd.clear > div.rd_body.clear > ul > li > a').get_text()
print(maintext)
print(tag)

问题是,如果最近的帖子中没有标签,它会返回如下错误
AttributeError: 'NoneType' object has no attribute 'get_text'

如果我从该代码中删除.get_text(),并且该标签不在最近的帖子中,它将返回None
并且如果该标签存在,则返回<a href="/posts?search_target=tag&amp;search_keyword=ABC">ABC</a>
但我只想获得ABC

如何解决此问题

试试这个

import requests
from bs4 import BeautifulSoup
headers = 'User-Agent':'Mozilla/5.0'
url = "https:// " 
req = requests.get(url, headers=headers)
html = req.text       
soup = BeautifulSoup(html, 'html.parser')                
link = soup.select('#flagList > div.clear.ab-webzine > div > a')       
title = soup.select('#flagList > div.clear.ab-webzine > div > div.wz-item-header > a > span')         
latest_link = link[0] # link of latest post    
latest_title = title[0].text # title of latest post
# to get the text of latest post
t_url = latest_link
t_req = requests.get(t_url, headers=headers)
t_html = c_res.text
t_soup = BeautifulSoup(t_html, 'html.parser')  
maintext = t_soup.select ('#flagArticle > div.rhymix_content.xe_content')
try:
tag = t_soup.select_one('div.rd.clear > div.rd_body.clear > ul > li > a').text
print(tag)
except:
print("Sure the tag exists on this page??")
print(maintext)

最新更新