用python Selenium、BeautifulSoup和Pandas抓取数据表失败



我正在尝试使用python对以下网站中的表进行web抓取/提取。(这是一个动态表,所以我不能只把html保存在html文件中,因为它会每隔一段时间更新一次(。https://www.eib.org/en/about/procurement/index.htm

我的目标是把表变成一个数据帧。我用过:

  1. Selenium和BeautifulSoup,它们都返回一个空列表
  2. 带有pd.read_html的Pandas;找不到表";错误

有什么想法吗?我该怎么解决这个问题?

这是我的代码:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver import ActionChains
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

URL='http://www.eib.org/en/about/procurement/index.htm'
driver=webdriver.Firefox(executable_path ='/Users/***********')
driver.get(URL)
r=requests.get(URL)
soup = BeautifulSoup(r.content,"lxml")
page = driver.page_source
page_soup = soup(page,'html.parser')
#Using beautiful soup
elements=soup.findAll("tr")
print(elements)
for e in elements:
dr=e.find("td")
print(dr.text)
#Using selenium
elems = driver.find_elements_by_xpath("//td")
for elem in elems:
e=elem.find_element_by_tag_name("a")
print(e.tex)
#Using pandas
pd.read_html(URL)

谢谢!

尝试在响应中实际包含表的url。可以通过搜索开发工具中的"网络"选项卡找到:

import pandas as pd
url = 'https://www.eib.org/tools/jsp/calls.jsp?&lang=en&language=en&l=en&url=/about/procurement/index.htm&forceLanguage=en&_=1616778335822'
df = pd.read_html(url)[0]
print(df)

输出:

print(df)
Date                                              Title     Status
0    26/03/2021  Education Buildings in Ireland — Energy Effi...  On going
1    18/03/2021  Public Procurement Expertise and Support to ov...     Closed
2    18/03/2021  Advisory Support to Project Advisory Support U...     Closed
3    17/03/2021  Advanced Case Management System (e-CMS) for th...     Closed
4    16/03/2021  Technical Assistance to Support the Implementa...  On going
..          ...                                                ...        ...
524  01/09/2005  Maintenance contract for the EIB's parkland an...     Closed
525  05/08/2005                 Cleaning services for EIB premises     Closed
526  15/07/2005  Network maintenance engineering and acquisitio...     Closed
527  13/07/2005                      Contract relating to removals     Closed
528  09/07/2003  Extension of call for tenders for translation/...     Closed
[529 rows x 3 columns]

最新更新