我需要从https://www.comicshoplocator.com/StoreLocator解析商店名称(<div class="LocationName">
)。问题是,当您在搜索中输入邮政编码(例如73533)时,它不会出现在URL中。因此,python无法看到页面上的元素。这是我的代码片段。因此,我没有收到任何输出。
如何使python看到输入与邮政编码?由于
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
browser = webdriver.Firefox(executable_path=r'C:GeckodriverGeckodriver.exe')
browser.get('https://www.comicshoplocator.com/StoreLocator')
browser.find_element(By.NAME, 'query').send_keys('73533' + Keys.RETURN)
html = browser.page_source
soup = BeautifulSoup(html, features="html.parser")
for tag in soup.find_all('div', class_="LocationName"):
print(tag.text)
问题在这里:browser.find_element(By.NAME, 'query').send_keys('73533' + Keys.RETURN)
正确的是:
search = browser.find_element(By.NAME, 'query')
search.send_keys('73533')
search.send_keys(Keys.RETURN)
完整工作代码:
我使用chrome驱动程序,你可以改变那部分没有时间
import time
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get('https://www.comicshoplocator.com/StoreLocator')
driver.maximize_window()
time.sleep(2)
d=driver.find_element(By.NAME, 'query')
d.send_keys('73533')
d.send_keys(Keys.ENTER)
soup = BeautifulSoup(driver.page_source, 'lxml')
for tag in soup.find_all('div', class_="LocationName"):
print(tag.text)
输出:
MARK DOWN COMICS
WWW.DCBSERVICE.COM
嗯,实际上,这可以用requests
完成,没有必要使用Selenium
。您可以将post
请求发送到:
https://www.comicshoplocator.com/StoreLocator
import re
import requests
from bs4 import BeautifulSoup
data = {
"showAll": "false",
"showCsls": "true",
"query": "73533",
}
response = requests.post(
"https://www.comicshoplocator.com/StoreLocator",
data=data,
)
soup = BeautifulSoup(response.text, "html.parser")
string = soup.select_one("script:-soup-contains('address')").string
unformatted_data = re.search(r"(({.*?}))", string, re.DOTALL).group(1)
# remove all the whitespace
formatted_data = re.sub(r"s+", "", unformatted_data)
print(formatted_data)
打印:
{storeno:"8816",lat:"41.0671081542969",lng:"-85.1372680664063",name:"WWW.DCBSERVICE.COM",address:"6005ESHELBYDR",address2:"WWW.DCBSERVICE.COM",city:"MEMPHIS",state:"TN",zip:"38141",phone:"",hasProfile:"True",storeLogo:'/Image/CslsLogo/'+"8816"}
这段代码为我工作:
listings = browser.find_elements(By.CLASS_NAME, 'CslsLocationItem')
for listing in listings:
print(listing.find_element(By.CLASS_NAME,'LocationName').get_attribute('innerText'))