我试图刮电子邮件来刮电子邮件,但它给了我none
。这些是页面链接:https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from time import sleep
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
base_url='https://www.avocats-lille.com/'
url = 'https://www.avocats-lille.com/fr/annuaire/avocats-du-tableau-au-barreau-de-lille?view=entries'
driver = webdriver.Chrome("C:Program Files (x86)chromedriver.exe")
driver.get(url)
soup = BeautifulSoup(driver.page_source, "html.parser")
tra = soup.find_all('h2',class_='title')
productlinks=[]
for links in tra:
for link in links.find_all('a',href=True):
comp=base_url+link['href']
productlinks.append(comp)
for link in productlinks:
r =requests.get(link,headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
sleep(5)
details=soup.find_all("div",class_="item col-5")
for detail in details:
email=soup.find('a[href^="mailto"]')
print(email)
您要查找的链接不在tra
(title(元素内
您应该按照以下方式更改代码以使其工作:
tra = soup.find_all('div',class_='item')
电子邮件地址位于以下元素中:
<a href="mailto:kamelabbas2002@yahoo.fr">kamelabbas2002@yahoo.fr</a>
解决方案
使用Selenium打印电子邮件地址,即innertext
属性,您可以使用以下定位策略之一:
使用css_selector:
driver.execute("get", {'url': 'https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry'}) print(driver.find_element("css selector", 'a[href^="mailto"]').text)
使用xpath:
driver.execute("get", {'url': 'https://www.avocats-lille.com//fr/annuaire/avocats-du-tableau-au-barreau-de-lille/2?view=entry'}) print(driver.find_element("xpath", '//a[starts-with(@href, "mailto")]').text)
控制台输出:
kamelabbas2002@yahoo.fr