我正试图从网站上抓取一些数据,但这些数据包含在Iframe中。最初我刮取了源链接,但从源也无法刮取数据。我需要帮助如何从这个源链接中提取数据。以下是源链接:https://chartviewer-europublic.bigapis.net/nzgaV/index.html
此外,我在这里分享了屏幕截图,显示了";a";标签,但我也无法提取此链接。
在此处输入图像描述
这是我用过的代码。我用了BeautifulSoup刮胡子。
# Libraries
from bs4 import BeautifulSoup
import requests
# Original website link
url_spain_total="https://anfac.com/cifras-clave/matriculaciones-turismos-y-todoterreno/"
page_total=requests.get(url_spain_total).text
soup_spain_total=BeautifulSoup(page_total,"lxml")
print(soup_spain_total.prettify())
# Getting the list of links in the iframe
result_spain=soup_spain_total.find_all("iframe")
result_spain
# Getting the required source link
total_main_link=result_spain[1]["src"]
total_main_link
在获得源链接后,我无法提取表内容。
感谢您的帮助。提前感谢!
以下是如何使用硒获取数据的示例:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
# chrome_options.add_argument("--headless")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1920,1080")
webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(browser, 20)
url = ' https://chartviewer-europublic.bigapis.net/nzgaV/index.html'
browser.get(url)
table = wait.until(EC.element_to_be_clickable((By.ID, "datatable")))
df = pd.read_html(table.get_attribute("outerHTML"))[0]
print(df)
这将获得作为数据帧的信息,并在终端中显示:
Categoría | Acumulato 2021 | %Variacion Acumulato | |||
---|---|---|---|---|---|
0 | Gasolina | -17-17.34 | |||
1 | Diesel | >td style="text-align:right;">8.96398.06481 | 11211.15 | 92.9799 | >|
2 | Resto | ||||
3 | 可燃物总量 | ||||
4 | 特殊 | 24.9512 | 26.0833 | >td style="text-align:left;">-4,3-4.34>233.413 | 236.728 | 1-1.4
5 | Empresa | 21.7122 | >td style="ext-align:rights;">17.6732>td style="text-align:left;">22922.85224.337 | 215.654 | >44.03 |
6 | |||||
7 |