lxml获取共享某个xpath的所有项



我试图从一个网站抓取所有的价格,使用xpath。所有的价格都有相同的xpath,并且只有[0],或者我假设第一项有效…我给你示范一下:

webpage = requests.get(URL, headers=HEADERS)
soup = BeautifulSoup(webpage.content, "html.parser")
dom = etree.HTML(str(soup))
print(dom.xpath('/html/body/div[1]/div[5]/div/div/div/div[1]/ul/li[1]/article/div[1]/div[2]/div')[0].text)

成功打印第一个价格!!我试着改变"[0].text"到1,打印第二项,但是返回"out of range"。然后我试着想一些可以打印所有项目的For循环,这样我就可以创建一个平均值。

任何帮助将非常感激!!

对不起,编辑的是代码

from bs4 import BeautifulSoup从LXML导入树导入请求

URL = "https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709">

#HEADERS =你需要在这里添加你自己的header,不允许post。

网页=请求。get (URL,标题=头)

soup = BeautifulSoup(网页)内容,"html.parser"

dom = etree.HTML(str(soup))

打印(dom.xpath ('/html/身体/div [10]/div[4]/部分/div/div/div [2]/div/div/div/div [2]/div/div [2]/div [2]/div [1]/div/div [2]/ul/李[3]/强")[0]。text)

您可以使用css选择器,在这种情况下,它更具可读性。我还会删除一些报价信息,只留下实际价格。

import requests
from bs4 import BeautifulSoup as bs
from pprint import pprint
r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = {}
for i in soup.select('.item-container'):
if a:=i.select_one('.price-current-num'): a.decompose()
prices[i.select_one('.item-title').text] = i.select_one('.price-current').get_text(strip=True)[:-1]
pprint(prices)

价格作为浮动列表

import requests, re
from bs4 import BeautifulSoup as bs
from pprint import pprint
r = requests.get("https://www.newegg.com/p/pl?d=GPU&N=601357247%20100007709", headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.text, features="lxml")
prices = []
for i in soup.select('.item-container'):
if a:=i.select_one('.price-current-num'): a.decompose()
prices.append(float(re.sub('$|,', '', i.select_one('.price-current').get_text(strip=True)[:-1])))
pprint(prices)

最新更新