使用python从站点保存文本文件

使用Python，我的任务是简单地从这个站点(https://www.cboe.com/us/equities/market_statistics/corporate_action/)获取html源代码，并将第一个文本文件保存在名为"corporate_action_rpt_20220621.txt"点击这里查看图片现在，我可以使用BeautifulSoup读取这行html，如下所示，来自该站点的源代码:

<a href="2022/06/bzx_equities_corporate_action_rpt_20220621.txt-dl">corporate_action_rpt_20220621.txt</a>

下面是我使用的代码:

import requests
from bs4 import BeautifulSoup
import os
URL = "https://www.cboe.com/us/equities/market_statistics/corporate_action/"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('table')
textFileRow = table.tbody.find('tr').find('td').find('a')
print(textFileRow)

我如何打开并保存文本文件从这里使用Python?

您必须使用已检索到的a标记的href中的URL获取文件，如下所示:

import requests
from bs4 import BeautifulSoup
import os
URL = "https://www.cboe.com/us/equities/market_statistics/corporate_action/"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('table')
textFileRow = table.tbody.find('tr').find('td').find('a')
r = requests.get(URL + textFileRow['href'])
r.encoding = 'utf-8'
with open("textFile.txt", "w") as text_file:
text_file.write(r.text)

相关内容

最新更新

热门标签：