如何抓取链接表,单击链接,然后抓取链接内部的数据?



我目前正试图从这个网站"https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1"刮表,然后点击马的名字,这将导致我们到一个新的链接,并刮表在那里。

这是我目前拥有的代码。这只是第一匹马的测试代码。(有些导入是为了将来的东西)

import pandas as pd
import xlsxwriter
from bs4 import BeautifulSoup
from playwright.sync_api import Playwright, sync_playwright, expect
import xlwings as xw

def scrape_ranking(url):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto(url)
page.click('text="AI ONE"')    #the link that will lead us to the horse info
html = page.content()
browser.close()
tables = pd.read_html(html)
df = tables[0]
df.to_excel("hkjc.xlsx", index=False)
url_1 = ('https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1')
scrape_ranking(url_1)

这段代码不会崩溃。但是,它不是打印马记录表,而是打印来自本网站的原始表"https://racing.hkjc.com/racing/information/English/racing/RaceCard.aspx?RaceDate=2023/04/06&Racecourse=HV&RaceNo=1"(比赛卡)。

是否有一种方法可以使代码点击马的名字(链接),这导致它到一个新的网站(马的记录),并打印出表?

网站打开一个弹出窗口,显示马的详细信息。你可以在文档中使用处理弹出窗口和等待页面加载的代码:

# ...
page.goto(url)
with page.expect_popup() as popup_info:
page.click('text="AI ONE"')
popup = popup_info.value
popup.wait_for_load_state("domcontentloaded")
html = popup.content()
# ...

相关内容

  • 没有找到相关文章

最新更新