我的代码访问一个网页,并想拉每一行信息,但它拉空白。
期望输出=打印每行的标题。
目前,它只是为我打印空白。
import time
import requests
from selenium import webdriver
driver = webdriver.Chrome()
bracket=[]
url='https://www.sabcs.org/Program/Poster-Sessions/Poster-Session-1'
driver.get(url)
time.sleep(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
r=requests.get(url)
page_source=r.content
each_field=driver.find_elements_by_xpath(".//tr[@class='normaltext']")
for item in each_field:
print(item.text)
您需要切换到<iframe>
标记。另外,我在这里只使用pandas来解析表。
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
bracket=[]
url='https://www.sabcs.org/Program/Poster-Sessions/Poster-Session-1'
driver.get(url)
driver.switch_to.frame(driver.find_elements_by_xpath(".//iframe")[-1])
df = pd.read_html(driver.page_source)[0]
输出:
print(df)
0 1
0 NaN NaN
1 Poster Session 1 – Wednesday, December 8, 2021... Poster Session 1 – Wednesday, December 8, 2021...
2 NaN NaN
3 NaN Axillary Staging and Sentinel Nodes
4 P1-01-01 Prospective ultrasonographic surveillance stud...
.. ... ...
279 P1-24-04 Spatially resolved cell type heterogeneity unc...
280 P1-24-05 Breast conserving surgery for non-metastatic i...
281 P1-24-06 Risk factor modeled microenvironment effects l...
282 P1-24-07 Management trends and outcomes assessment for ...
283 NaN NaN
[284 rows x 2 columns]