在Python/BeautifulSoup中,代码title
值下面是
<span class="ux-textspans"><!--F#f_7[0]-->4K Photon MONO<!--F/--></span>
当使用title.get_text()
获取文本4K Photon MONO
时,它失败了。有什么可以帮助吗?谢谢!
import requests
from bs4 import BeautifulSoup
url='https://www.ebay.com/itm/284163810059'
req=requests.get(url)
soup=BeautifulSoup(req.text,'lxml')
title=soup.select('div > div:nth-child(2) > div:nth-child(4) > div > span > div > span')
title_text= title.get_text()
也可以使用soup.find
函数。
import requests
from bs4 import BeautifulSoup
url='https://www.ebay.com/itm/284163810059'
req=requests.get(url)
soup=BeautifulSoup(req.text,'lxml')
title=soup.find("span", {"itemprop" : "model"})
title_text= "" if title is None else title.get_text()
发生这种情况是因为select返回一个列表,而不是单个字符串解决:
import requests
from bs4 import BeautifulSoup
url='https://www.ebay.com/itm/284163810059'
req=requests.get(url)
soup=BeautifulSoup(req.text,'lxml')
title=soup.select('div > div:nth-child(2) > div:nth-child(4) > div > span > div > span')
text = ''.join(list(map(lambda t: t.get_text(),title)))
print(text)
主要问题是您使用的select
将返回ResultSet
,并且您不能使用get_text()
或text
,直到您迭代它并在每个元素上调用该方法。另一个问题是你的选择,它可以更具体。
那么如何修复呢?用select_one()
代替select()
直接调用get_text()
:
soup.select_one('[itemprop="model"]')
请注意,您应该始终检查您尝试选择的元素是否可用:
title = e.get_text() if (e:= soup.select_one('[itemprop="model"]')) else None
注意:walrus操作符要求python 3.8或更高版本
python <3.8:
title = soup.select_one('[itemprop="model"]').get_text() if soup.select_one('[itemprop="model"]') else None
import requests
from bs4 import BeautifulSoup
url='https://www.ebay.com/itm/284163810059'
req=requests.get(url)
soup=BeautifulSoup(req.text)
title = e.get_text() if (e:= soup.select_one('[itemprop="model"]')) else None
title
输出4K Photon MONO
import requests
from bs4 import BeautifulSoup
url='https://www.ebay.com/itm/284163810059'
req=requests.get(url)
soup=BeautifulSoup(req.text)
title = e.get_text() if (e:= soup.select_one('[itemprop="model"]')) else None
title
4K Photon MONO