谷歌地图一些XPATH选择器返回数据,有些不是SeleniumPython



我试图抓取谷歌地图。phone和hours变量不会返回任何数据。其他变量工作正常并返回数据。XPATH是正确的。我不确定这里出了什么问题。

这是链接

其他选择器,如姓名、地址、标题、网站,会很好地返回数据,但电话和时间不会返回任何数据。

希望得到一些答案。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from scrapy.selector import Selector
import csv
from tqdm import tqdm
import time
driver = webdriver.Firefox()

linksFile=open("links.txt",'r')
allLinks = linksFile.readlines()

for link in tqdm(allLinks):
try:
driver.get(link)
except Exception:
print('Something went wrong with the URL: ')

# time.sleep(15)

while True:
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.XPATH, '//div[contains(text(), "Directions")] | //div[contains(text(), "Website")]'))
)
results = driver.find_elements_by_xpath('//div[contains(text(), "Directions")] | //div[contains(text(), "Website")]')
for result in results:
# writing to the CSV file
outFile =  open("data.csv",'a+',newline="")
writer = csv.writer(outFile)
business = driver.find_element_by_xpath('//div[@role="heading"]/div')
business.click()
# waiting for the page to load
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.XPATH, '//div[@class="immersive-container"]'))
)

# parcing response to the scrapy selector
response = Selector(text=driver.page_source)
name = response.xpath('//h2[@data-attrid="title"]/span/text()').get()
title = response.xpath('(//span[contains(text(), "Google reviews")])/parent::a/parent::span/parent::span/parent::div/parent::div/parent::div/following-sibling::div/div/span/span/text()').get()
address = response.xpath('//a[contains(text(), "Address")]/parent::span/following-sibling::span/text()').get()
website = response.xpath('(//a[contains(text(), "Website")])/@href').get()
phone = response.xpath('//a[contains(text(), "Phone")]/parent::span/following-sibling::span/a/span/text()').get()
hours = response.xpath('//a[contains(text(), "Hours")]/parent::span/following-sibling::div/label/span//btext()').get()
total_reviews = response.xpath('(//span[contains(text(), "Google reviews")])[1]/text()').get()
total_rating = response.xpath('(//span[contains(text(), "Google reviews")])/parent::a/parent::span/parent::span/parent::div/span/text()').get()

input('Check: ')


outFile =  open("data.csv",'a+',newline="")
writer = csv.writer(outFile)

vals = [name, title, address, website, phone, hours, total_reviews, total_rating]
writer.writerow(vals)
outFile.close()

您可以使用pageSource的Java脚本outerHTML插件吗。

response = Selector( driver.execute_script("return document.documentElement.outerHTML"))

此外,在xpath of Hours中还有一个问题:

hours = response.xpath('//a[contains(text(), "Hours")]/parent::span/following-sibling::div/label/span//b/text()').get()

尝试谷歌地图链接,而不是谷歌搜索:https://www.google.com/maps/place/Leduc+管道+和+供暖/@53.274672,-113.5486679,17z/data=!3m1!4b1!4m5!3m4!1s0x539ff9a5d31a87c9:0xf494d91afd55e55!8m2!3d53.2746688!4d-113.5464739

IT应该更加稳定。

最新更新