我是python硒的新手。我想得到所有隐藏的href链接
<div class="page-body">
<div class="page-title"></div>
<div class="page cursorPointer">
<a title="" data-placement="top" data-toggle="tooltip" href="#" data-original-title="Verified"></a></div>
</div>
这是我的脚本:
#!/usr/bin/python3
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.chrome.options import Options
import requests
import re
from openpyxl import Workbook
driver = webdriver.Chrome(options=options)
driver.get(
"https://someurl.com")
pagelist = []
content = driver.page_source
soup = BeautifulSoup(content, 'lxml')
for a in soup.findAll('div', attrs={'class': 'page cursorPointer'}):
page = a.find_element_by_xpath("//a[@href]")
pagelist.append(page.get_attribute("href"))
df = pd.DataFrame({'Page': pagelist})
df.to_excel('pagelist.xlsx', index=False, encoding='utf-8')
我得到这个错误:
page=a.find_element_by_xpath("//a[@href]"(类型错误:'NoneType'对象不可调用
发生这种情况是因为您在汤对象上使用了selenium方法。这样试试:
pagelist = driver.execute_script("""
return [...document.querySelectorAll('a[href]')].map(a => a.href)
""")