解析 CSV 文件中的 URL - Python - Parsing through URL's in CSV file

我有一个URL的CSV文件，我正试图编写一个代码来遍历URL，并将它们附加到字典中的特定变量中。不幸的是，每当我尝试使用漂亮的汤时，程序不会分离URL，或者只分离第一个URL。我知道这可能是一个简单的问题，但我无法使用类似问题的解决方案来解决这个问题。下面我附上了一段代码摘录。感谢您的指导。

csv_data:
'https://www.sec.gov/Archives/edgar/data/78003/000007800313000017,https://www.sec.gov/Archives/edgar/data/78003/000115752312004450,https://www.sec.gov/Archives/edgar/data/78003/000115752312002789,https://www.sec.gov/Archives/edgar/data/78003/000007800313000013,https://www.sec.gov/Archives/edgar/data/78003/000007800313000029,https://www.sec.gov/Archives/edgar/data/78003/000007800312000008,https://www.sec.gov/Archives/edgar/data/78003/000007800314000046'

content = requests.get(csv_data[1]).content
soup = BeautifulSoup(content, 'lxml')
reports = soup.find('myreports')
master_reports = []
for report in reports.find_all('report')[:-1]:
report_dict = {}
report_dict['name_short'] = report.shortname.text
report_dict['category'] = report.menucategory.text
report_dict['url'] = base_url + report.htmlfilename.text
master_reports.append(report_dict)
print(base_url + report.htmlfilename.text)
print(report.shortname.text)
print(report.menucategory.text)

这就是您想要的吗？拆分url列表并循环？如果是这样的话，您将不得不收集每个循环的输出，这里没有对其进行编码。

csv_data = 'https://www.sec.gov/Archives/edgar/data/78003/000007800313000017,https://www.sec.gov/Archives/edgar/data/78003/000115752312004450,https://www.sec.gov/Archives/edgar/data/78003/000115752312002789,https://www.sec.gov/Archives/edgar/data/78003/000007800313000013,https://www.sec.gov/Archives/edgar/data/78003/000007800313000029,https://www.sec.gov/Archives/edgar/data/78003/000007800312000008,https://www.sec.gov/Archives/edgar/data/78003/000007800314000046'
csv_url_list = csv_data.split(',')
for url in csv_url_list:
content = requests.get(url).content
soup = BeautifulSoup(content, 'lxml')
reports = soup.find('myreports')
master_reports = []
for report in reports.find_all('report')[:-1]:
report_dict = {}
report_dict['name_short'] = report.shortname.text
report_dict['category'] = report.menucategory.text
report_dict['url'] = base_url + report.htmlfilename.text
master_reports.append(report_dict)
print(base_url + report.htmlfilename.text)
print(report.shortname.text)
print(report.menucategory.text)

解析 CSV 文件中的 URL - Python

相关内容

最新更新

热门标签：