如何读取CSV或文本文件的行，循环访问每一行，然后为每读取一行保存到新文件中

我有一个独特的问题，我认为我已经解决了，直到我使用 While 循环来控制这个程序的流程。

概要：

我有一个平面文件(CSV 或文本)，其中包含一些我要抓取的 URL，使用 BeautifulSoup 将新标签附加到 HTML(有效)，然后将每个抓取的 URL 保存到新文件名。

我需要什么：

遍历每一行
获取网址
抓取每个网址的页面
附加新的 HTML 标记
保存文件，如果可能，请使用 HTML 文件的名称
再次重新启动同一程序，从它转到下一行。

我非常确定这与我无法理解基础知识有关，我仍在努力解决这个问题。这是我的代码：

怎么了：

使用Python3，代码实际上有效，我使用 Jupyter 逐行观察代码和一系列打印语句，以查看 While 循环运行时返回的内容。

问题是只保存了一个文件，文件末尾的 URL 是唯一保存的内容。其他网址将被抓取。

如何在进入下一行之前让每一行迭代并抓取以唯一保存？我是否正确地使用这些构造？

网址：

https://www.imgacademy.com/media/headline/img-academy-alumna-jacqueline-bendrick-ready-tee-against-men-golfbc-championship

https://www.imgacademy.com/media/headline/img-academy-u19-girls-win-fysa-state-cup-u19-championship

https://www.imgacademy.com/media/headline/img-academy-celebrates-largest-commencement-ceremony-date-200-ascenders-earn

法典：

import csv
import requests
from bs4 import BeautifulSoup as BS
filename = 'urls.csv'
with open(filename, 'r+') as file:

while True:
line = file.readline()
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0'
headers = {'User-Agent':user_agent}
response = requests.get(line, headers)
print(response)
soup = BS(response.content, 'html.parser')
html = soup
title = soup.find('title')
meta = soup.new_tag('meta')
meta['name'] = "robots"
meta['content'] = "noindex, nofollow"
title.insert_after(meta)
for i 
with open('{}'".txt".format("line"), 'w', encoding='utf-8') as f:
outf.write(str(html))
if (line) == 0:
break

filename = 'urls.csv'
with open(filename, 'r+') as file:
#line = line.replace('n', '')
print(line)
for index, line  in enumerate(file):
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0'
headers = {'User-Agent':user_agent}
print(headers)
response = requests.get(line, headers)
print(response)
soup = BS(response.content, 'html.parser')
html = soup
title = soup.find('title')
meta = soup.new_tag('meta')
meta['name'] = "robots"
meta['content'] = "noindex, nofollow"
title.insert_after(meta)
with open('{}.html'.format(line[41:]), 'w', encoding='utf-8') as f:
f.write(str(html))

相关内容

最新更新

热门标签：