Web Scratching用BeautifulSoup-Issue捕获具有初始相同名称的不同标签



我正在字符串中捕获仅de标记"a价格";从搜索第一页的结果中;iphone";在亚马逊网站上。

但是,结果包括以";a价格";,作为";a-price a-text-price";。我怎么能被我的抓取代码捕获而忽略这些标签?

遵循刮取代码:

s = BeautifulSoup(resp.content, features="lxml")
prices = s.find_all("span", attrs={
"class": "a-price"})
print(prices)

按照打印结果:

[<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$6.226,87</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">6.226<span class="a-price-decimal">,</span></span><span class="a-price-fraction">87</span></span></span>, **<span class="a-price a-text-price"** data-a-color="secondary" data-a-size="b" data-a-strike="true"><span class="a-offscreen">R$6.628,98</span><span aria-hidden="true">R$6.628,98</span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$5.099,90</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">5.099<span class="a-price-decimal">,</span></span><span class="a-price-fraction">90</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$1.460,00</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">1.460<span class="a-price-decimal">,</span></span><span class="a-price-fraction">00</span></span></span>, <span class="a-price a-text-price" data-a-color="secondary" data-a-size="b" data-a-strike="true"><span class="a-offscreen">R$1.899,00</span><span aria-hidden="true">R$1.899,00</span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$7.488,00</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">7.488<span class="a-price-decimal">,</span></span><span class="a-price-fraction">00</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$3.874,98</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">3.874<span class="a-price-decimal">,</span></span><span class="a-price-fraction">98</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$5.899,90</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">5.899<span class="a-price-decimal">,</span></span><span class="a-price-fraction">90</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$3.499,90</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">3.499<span class="a-price-decimal">,</span></span><span class="a-price-fraction">90</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$5.222,38</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">5.222<span class="a-price-decimal">,</span></span><span class="a-price-fraction">38</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$3.299,00</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">3.299<span class="a-price-decimal">,</span></span><span class="a-price-fraction">00</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$5.661,00</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">5.661<span class="a-price-decimal">,</span></span><span class="a-price-fraction">00</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$4.788,90</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">4.788<span class="a-price-decimal">,</span></span><span class="a-price-fraction">90</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$5.999,90</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">5.999<span class="a-price-decimal">,</span></span><span class="a-price-fraction">90</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$8.974,98</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">8.974<span class="a-price-decimal">,</span></span><span class="a-price-fraction">98</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$4.117,43</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">4.117<span class="a-price-decimal">,</span></span><span class="a-price-fraction">43</span></span></span>, <span class="a-price a-text-price" data-a-color="secondary" data-a-size="b" data-a-strike="true"><span class="a-offscreen">R$5.199,00</span><span aria-hidden="true">R$5.199,00</span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$6.935,00</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">6.935<span class="a-price-decimal">,</span></span><span class="a-price-fraction">00</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$3.058,98</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">3.058<span class="a-price-decimal">,</span></span><span class="a-price-fraction">98</span></span></span>, <span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$29,90</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">29<span class="a-price-decimal">,</span></span><span class="a-price-fraction">90</span></span></span>, <span class="a-price a-text-price" data-a-color="secondary" data-a-size="b" data-a-strike="true"><span class="a-offscreen">R$34,89</span><span aria-hidden="true">R$34,89</span></span>]

您可以使用CSS选择器[class="a-price"]只获取类别为a-price的标记,而不获取其他标记。

例如:

import requests
from bs4 import BeautifulSoup

url = 'https://www.amazon.com.br/s?k=iphone&__mk_pt_BR=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss_2'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'lxml')
for t in soup.select('[class="a-price"]'):
print(t)
print('-' * 80)

打印:

<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$6.226,87</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">6.226<span class="a-price-decimal">,</span></span><span class="a-price-fraction">87</span></span></span>
--------------------------------------------------------------------------------
<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$1.486,90</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">1.486<span class="a-price-decimal">,</span></span><span class="a-price-fraction">90</span></span></span>
--------------------------------------------------------------------------------
<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$7.488,00</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">7.488<span class="a-price-decimal">,</span></span><span class="a-price-fraction">00</span></span></span>
--------------------------------------------------------------------------------
<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$3.874,98</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">3.874<span class="a-price-decimal">,</span></span><span class="a-price-fraction">98</span></span></span>
--------------------------------------------------------------------------------
<span class="a-price" data-a-color="base" data-a-size="l"><span class="a-offscreen">R$3.499,90</span><span aria-hidden="true"><span class="a-price-symbol">R$</span><span class="a-price-whole">3.499<span class="a-price-decimal">,</span></span><span class="a-price-fraction">90</span></span></span>
--------------------------------------------------------------------------------
... and so on.

尝试在find_all()函数中使用class_参数。

s = BeautifulSoup(resp.content, features="lxml")
prices = s.find_all("span", class_ = 'a-price'})

使用[print(''.join(list(t.stripped_strings))) for t in s.select('[class="a-price-whole"]')]

给出

6.226, 1.486, 7.488, 3.874, 3.499, 5.099, 5.222, 3.097, 5.661, 5.899, 8.974, 6.935, 3.058, 29, 2.559,

最新更新