在Selenium中使用Xpath来引用特定的标记



你好,我得到AttributeError: 'str' object has no attribute 'find_element'的错误,即使我的代码似乎是正确的,我看起来有点在堆栈,但我没有找到解决我的具体问题。我想在我的html代码中获得<br>标签后的第五个<p>集的值,如果你看到任何错误请纠正我。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd
import time
options = Options()
# Creating our dictionary
all_services = pd.DataFrame(columns=['Motif', 'Description'])
path = "C:/Users/Al4D1N/Documents/ChromeDriver_webscraping/chromedriver.exe"
driver = webdriver.Chrome(options=options, executable_path=path)
driver.get("https://www.mairie.net/national/acte-naissance.htm#plus")
# We store our 'Motif' & 'Description' for the first link 'acte-naissance'
service = driver.find_element(By.CLASS_NAME, "section-title").text
desc = service.find_element(By.XPATH, "//*[@class='section-group']/p[5]/following::br")
print(desc)
all_services = all_services.append({'Motif': service, 'Description': desc}, ignore_index=True)
# Get all elements in class 'list-images'
list_of_services = driver.find_elements_by_class_name("list-images")
all_services.to_excel('Services.xlsx', index=False)

我可以使用这个组合来获得你想要的文本:

service = driver.find_element_by_class_name("section-title").text
txt = driver.find_element_by_xpath("//*[@class='section-group']/p[5]").text
desc = txt.split("n")[1]

我也这样做了请求和bs4,这通常比无头浏览器快一点(同样的问题刮板可导航性):

import requests
from bs4 import BeautifulSoup as Bs
r = requests.get("https://www.mairie.net/national/acte-naissance.htm#plus")
html = Bs(r.text, "lxml")
section = html.find("h1", {"class": "section-title"}).get_text()
div = html.find("div", {"class": "section-group"})
p = div.find_all("p")[4]
p.strong.extract()
txt = p.get_text().split("t")
desc = (txt[1] + txt[2]).replace("n", " ")[:-1]

尝试从

中删除.text
service = driver.find_element(By.CLASS_NAME, "section-title").text

最新更新