在Python/BeautifulSoup中,get_text()失败



在Python/BeautifulSoup中,代码title值下面是

<span class="ux-textspans"><!--F#f_7[0]-->4K Photon MONO<!--F/--></span> 

当使用title.get_text()获取文本4K Photon MONO时,它失败了。有什么可以帮助吗?谢谢!

import requests
from  bs4 import BeautifulSoup 
url='https://www.ebay.com/itm/284163810059'
req=requests.get(url)
soup=BeautifulSoup(req.text,'lxml')
title=soup.select('div > div:nth-child(2) > div:nth-child(4) > div > span > div > span')
title_text= title.get_text()

也可以使用soup.find函数。

import requests
from  bs4 import BeautifulSoup 
url='https://www.ebay.com/itm/284163810059'
req=requests.get(url)
soup=BeautifulSoup(req.text,'lxml')
title=soup.find("span", {"itemprop" : "model"})
title_text= "" if title is None else title.get_text()

发生这种情况是因为select返回一个列表,而不是单个字符串解决:

import requests
from  bs4 import BeautifulSoup 
url='https://www.ebay.com/itm/284163810059'
req=requests.get(url)
soup=BeautifulSoup(req.text,'lxml')
title=soup.select('div > div:nth-child(2) > div:nth-child(4) > div > span > div > span')
text = ''.join(list(map(lambda t: t.get_text(),title)))
print(text)

主要问题是您使用的select将返回ResultSet,并且您不能使用get_text()text,直到您迭代它并在每个元素上调用该方法。另一个问题是你的选择,它可以更具体。

那么如何修复呢?用select_one()代替select()直接调用get_text():

soup.select_one('[itemprop="model"]')

请注意,您应该始终检查您尝试选择的元素是否可用:

title = e.get_text() if (e:= soup.select_one('[itemprop="model"]')) else None

注意:walrus操作符要求python 3.8或更高版本

python <3.8:

title = soup.select_one('[itemprop="model"]').get_text() if soup.select_one('[itemprop="model"]') else None
import requests
from  bs4 import BeautifulSoup 
url='https://www.ebay.com/itm/284163810059'
req=requests.get(url)
soup=BeautifulSoup(req.text)
title = e.get_text() if (e:= soup.select_one('[itemprop="model"]')) else None
title
输出
4K Photon MONO

相关内容

最新更新