用搜索栏解析网页

我需要从https://www.comicshoplocator.com/StoreLocator解析商店名称(<div class="LocationName">)。问题是，当您在搜索中输入邮政编码(例如73533)时，它不会出现在URL中。因此，python无法看到页面上的元素。这是我的代码片段。因此，我没有收到任何输出。

如何使python看到输入与邮政编码?由于

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
browser = webdriver.Firefox(executable_path=r'C:GeckodriverGeckodriver.exe')
browser.get('https://www.comicshoplocator.com/StoreLocator')
browser.find_element(By.NAME, 'query').send_keys('73533' + Keys.RETURN)
html = browser.page_source
soup = BeautifulSoup(html, features="html.parser")
for tag in soup.find_all('div', class_="LocationName"):
print(tag.text)

问题在这里:browser.find_element(By.NAME, 'query').send_keys('73533' + Keys.RETURN)

正确的是:

search = browser.find_element(By.NAME, 'query')
search.send_keys('73533')
search.send_keys(Keys.RETURN)

完整工作代码:

我使用chrome驱动程序，你可以改变那部分没有时间

import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.comicshoplocator.com/StoreLocator')
driver.maximize_window()
time.sleep(2)
d=driver.find_element(By.NAME, 'query')
d.send_keys('73533') 
d.send_keys(Keys.ENTER)
soup = BeautifulSoup(driver.page_source, 'lxml')
for tag in soup.find_all('div', class_="LocationName"):
print(tag.text)

输出:

MARK DOWN COMICS
WWW.DCBSERVICE.COM

嗯，实际上，这可以用requests完成，没有必要使用Selenium。您可以将post请求发送到:

https://www.comicshoplocator.com/StoreLocator

import re
import requests
from bs4 import BeautifulSoup
data = {
"showAll": "false",
"showCsls": "true",
"query": "73533",
}
response = requests.post(
"https://www.comicshoplocator.com/StoreLocator",
data=data,
)
soup = BeautifulSoup(response.text, "html.parser")
string = soup.select_one("script:-soup-contains('address')").string
unformatted_data = re.search(r"(({.*?}))", string, re.DOTALL).group(1)
# remove all the whitespace
formatted_data = re.sub(r"s+", "", unformatted_data)
print(formatted_data)

打印:

{storeno:"8816",lat:"41.0671081542969",lng:"-85.1372680664063",name:"WWW.DCBSERVICE.COM",address:"6005ESHELBYDR",address2:"WWW.DCBSERVICE.COM",city:"MEMPHIS",state:"TN",zip:"38141",phone:"",hasProfile:"True",storeLogo:'/Image/CslsLogo/'+"8816"}

这段代码为我工作:

listings = browser.find_elements(By.CLASS_NAME, 'CslsLocationItem')
for listing in listings:
print(listing.find_element(By.CLASS_NAME,'LocationName').get_attribute('innerText'))

相关内容

最新更新

热门标签：