无法使用python selenium或请求库爬网完整的网站HTML内容



我正在尝试对此网站进行爬网"https://ec.europa.eu/info/law/better-regulation/have-your-say/initiatives/12527-Artificial-intelligence-ethical-and-legal-requirements/feedback_en?p_id=24212003"但只得到标题和很少的正文响应,无法得到完整的段落内容和页面链接。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get("https://ec.europa.eu/info/law/better-regulation/have-your-say/initiatives/12527-Artificial-intelligence-ethical-and-legal-requirements/feedback_en?p_id=24212003&page=1")
print(driver.page_source)
driver.quit()

所以响应没有href和标签,我需要

我需要从网站

结果输出

通过一些实验选项并抓取数据

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
# set chrome options and run headless
chrome_options = Options()
chrome_options.add_experimental_option("excludeSwitches", ["enable-automation",
'disable-component-update',
'ignore-certificate-errors'])
chrome_options.add_experimental_option('useAutomationExtension', False)
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(options=chrome_options, executable_path=DRIVER_PATH)
driver.get("https://ec.europa.eu/info/law/better-regulation/have-your-say/initiatives/12527-Artificial-intelligence-ethical-and-legal-requirements/feedback_en?p_id=24212003&page=1")
soup = BeautifulSoup(driver.page_source, 'html')
driver.quit()
# find all <a> tags and get the href using dict comprehension
d = {x.text: x['href'] for x in soup.find_all('a', href=True)}

最新更新