使用python从站点保存文本文件



使用Python,我的任务是简单地从这个站点(https://www.cboe.com/us/equities/market_statistics/corporate_action/)获取html源代码,并将第一个文本文件保存在名为"corporate_action_rpt_20220621.txt"点击这里查看图片现在,我可以使用BeautifulSoup读取这行html,如下所示,来自该站点的源代码:

<a href="2022/06/bzx_equities_corporate_action_rpt_20220621.txt-dl">corporate_action_rpt_20220621.txt</a>
下面是我使用的代码:
import requests
from bs4 import BeautifulSoup
import os
URL = "https://www.cboe.com/us/equities/market_statistics/corporate_action/"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('table')
textFileRow = table.tbody.find('tr').find('td').find('a')
print(textFileRow)

我如何打开并保存文本文件从这里使用Python?

您必须使用已检索到的a标记的href中的URL获取文件,如下所示:

import requests
from bs4 import BeautifulSoup
import os
URL = "https://www.cboe.com/us/equities/market_statistics/corporate_action/"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('table')
textFileRow = table.tbody.find('tr').find('td').find('a')
r = requests.get(URL + textFileRow['href'])
r.encoding = 'utf-8'
with open("textFile.txt", "w") as text_file:
text_file.write(r.text)

最新更新