美丽汤 - 所有 href 链接似乎都没有提取



我正在尝试提取类[地址']中的所有href链接。每次运行代码时,我只得到前5个,仅此而已,尽管我知道应该有9个。

网页:https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=锚定&from=localSearch

我已经阅读了下面的各种线程,无数次修改了我的代码,包括切换所有解析器(html.parser、html5lib、lxml、xml、lxml-xml(,但似乎都不起作用。知道是什么导致它在第五次迭代后停止吗?我对python还很陌生,所以如果这是我忽略的新手错误,我深表歉意。任何帮助都将不胜感激,即使是讽刺的回答:(

  • Beautiful Soup findAll dons';我找不到所有

  • 美丽汤4 find_all don';t find links that Beautiful Soup 3 find

  • BeautifulSoup无法解析长视图状态

  • Beautiful组丢失节点

  • Beautiful Soup结果上缺少零件

  • Python 64位存储的字符串长度不如32位Python

我在下面的网页上使用了非常相似的代码,在抓取hrefs时没有遇到任何问题:https://www.walgreens.com/storelistings/storesbystate.jsp?requestType=locatorhttps://www.walgreens.com/storelistings/storesbycity.jsp?requestType=locator&state=AK

我下面的代码:

import requests
from bs4 import BeautifulSoup

local_rg = requests.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = local_rg.content
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')
for link in local_rg_content_src.find_all('div'):
local_class = str(link.get('class'))
if str("['address']") in str(local_class):
local_a = link.find_all('a')
for a_link in local_a:
local_href = str(a_link.get('href'))
print(local_href)

我的结果(前5(:

  1. /locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
  2. /定位器/walgreens-725+e+北方+灯光+blvd-anchorage-ak-99503/id=13656
  3. /定位器/walgreens-4353+湖+otis+公园大道-通道-ak-99508/id=15653
  4. /定位器/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
  5. /定位器/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680

但应为9:

  1. /locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
  2. /定位器/walgreens-725+e+北方+灯光+blvd-anchorage-ak-99503/id=13656
  3. /定位器/walgreens-4353+湖+otis+公园大道-通道-ak-99508/id=15653
  4. /定位器/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
  5. /定位器/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
  6. /定位器/walgreens-250+e+88th+ave-anchorage-ak-99507/id=15654
  7. /定位器/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
  8. /定位器/沃尔格林-12051+老+格伦+海鹰+河流-ak-99577/id=15362
  9. /定位器/沃尔格林-1721+e+公园+hwy-wasilla-ak-99654/id=12681

尝试使用selenium而不是requests来获取页面的源代码。以下是您的操作方法:

from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')

代码的其余部分是相同的。这是完整的代码:

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch')
local_rg_content = driver.page_source
driver.close()
local_rg_content_src = BeautifulSoup(local_rg_content, 'lxml')
for link in local_rg_content_src.find_all('div'):
local_class = str(link.get('class'))
if str("['address']") in str(local_class):
local_a = link.find_all('a')
for a_link in local_a:
local_href = str(a_link.get('href'))
print(local_href)

输出:

/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
页面使用Ajax从外部URL加载存储信息。您可以使用requests/json模块加载它:
import re
import json
import requests

url = 'https://www.walgreens.com/storelocator/find.jsp?requestType=locator&state=AK&city=ANCHORAGE&from=localSearch'
ajax_url = 'https://www.walgreens.com/locator/v1/stores/search?requestor=search'
m = re.search(r'"lat":([d.-]+),"lng":([d.-]+)', requests.get(url).text)
params = {
'lat': m.group(1),
'lng': m.group(2)
}
data = requests.post(ajax_url, json=params).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for result in data['results']:
print(result['store']['address']['street'])
print('https://www.walgreens.com' + result['storeSeoUrl'])
print('-' * 80)

打印:

1470 W NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-1470+w+northern+lights+blvd-anchorage-ak-99503/id=15092
--------------------------------------------------------------------------------
725 E NORTHERN LIGHTS BLVD
https://www.walgreens.com/locator/walgreens-725+e+northern+lights+blvd-anchorage-ak-99503/id=13656
--------------------------------------------------------------------------------
4353 LAKE OTIS PARKWAY
https://www.walgreens.com/locator/walgreens-4353+lake+otis+parkway-anchorage-ak-99508/id=15653
--------------------------------------------------------------------------------
7600 DEBARR RD
https://www.walgreens.com/locator/walgreens-7600+debarr+rd-anchorage-ak-99504/id=12679
--------------------------------------------------------------------------------
2197 W DIMOND BLVD
https://www.walgreens.com/locator/walgreens-2197+w+dimond+blvd-anchorage-ak-99515/id=12680
--------------------------------------------------------------------------------
2550 E 88TH AVE
https://www.walgreens.com/locator/walgreens-2550+e+88th+ave-anchorage-ak-99507/id=15654
--------------------------------------------------------------------------------
12405 BRANDON ST
https://www.walgreens.com/locator/walgreens-12405+brandon+st-anchorage-ak-99515/id=13449
--------------------------------------------------------------------------------
12051 OLD GLENN HWY
https://www.walgreens.com/locator/walgreens-12051+old+glenn+hwy-eagle+river-ak-99577/id=15362
--------------------------------------------------------------------------------
1721 E PARKS HWY
https://www.walgreens.com/locator/walgreens-1721+e+parks+hwy-wasilla-ak-99654/id=12681
--------------------------------------------------------------------------------

最新更新