Python网络抓取:抓取后无法写入文件

我自己在练习网络抓取，并试图用python从一个中国网络小说网站上抓取一个网络小说系列。在我把我的python代码放在一个函数中之后，它似乎就坏了。我写了一段这样的代码：


import requests
from bs4 import BeautifulSoup

page = requests.get('https://www.51shucheng.net/zh-tw/wuxia/shediaoyingxiongzhuan')
soup = BeautifulSoup(page.content,'lxml')
page_list = soup.find_all(class_='mulu-list')
pages = page_list[0].find_all('a')
print(pages[0])
for i in range(len(pages)):
pages[i] = pages[i].get('href')

with open("射雕英雄傳1.txt", "w+") as file_object:
for i in range(len(pages)):
file_object.write('nnt{}'.format(i+1))
page = requests.get(pages[i])
soup = BeautifulSoup(page.content,'lxml')
content = soup.find(class_='neirong').text
print(content[0:20])
file_object.write(content)

with open('射雕英雄傳1.txt') as oldfile, open('射雕英雄傳.txt', 'w') as newfile:
for line in oldfile:
if not ('adsbygoogle' in line):
newfile.write(line)

而且效果非常好。然而，我想把它作为一项职能附在附件中，因此我作了以下修正。然后它就无法工作：射雕英雄傳"1.txt"文件仍在创建中，但它是空的。


import requests
from bs4 import BeautifulSoup

def scraping_novel(prefix,bookname):
page = requests.get('https://www.51shucheng.net/zh-tw/wuxia/{}'.format(prefix))
soup = BeautifulSoup(page.content,'lxml')

page_list = soup.find_all(class_='mulu-list')
pages = page_list[0].find_all('a')
print(pages[0])
for i in range(len(pages)):
pages[i] = pages[i].get('href')

with open("{}1.txt".format(bookname), "w+") as file_object:
for i in range(len(pages)):
file_object.write('nnt{}'.format(i+1))
page = requests.get(pages[i])
soup = BeautifulSoup(page.content,'lxml')
content = soup.find(class_='neirong').text
print(content[0:20])
file_object.write(content)
with open("{}1.txt".format(bookname)) as oldfile, open("{}1.txt".format(bookname), 'w') as newfile:
for line in oldfile:
if not ('adsbygoogle' in line):
newfile.write(line)    

scraping_novel("shediaoyingxiongzhuan","射雕英雄傳")                

#failed

我试过两件事：

将文件名从中文切换为英文，因为我认为这可能是编码方面的一些问题，但这没有帮助。事实上，这不是我第一次刮非英语网站，我从来没有见过这样的东西
在第一行With语句中，最后一行第二行print(content[0:20](，我试图检查内容。一切都很好，所以我认为问题不在于BS，而在于文件编写。输出文件中没有任何内容！顺便说一句，输出文件大小为零字节

如果有人能告诉我发生了什么，我将不胜感激，因为我仍然不知道出了什么问题。

with open("1.txt", "w+") as oldfile:
oldfile.write('test')
differentName = "12.txt"
with open("1.txt", "r") as oldfile, open(differentName, 'w') as newfile:
assert(len(oldfile.readlines()))     
sameName = "1.txt"
with open(sameName, "r") as oldfile, open(sameName, 'w') as newfile:
assert(len(oldfile.readlines()))

Lydia van Dyke提到的拼写错误导致文件被打开进行写作，并提前结束阅读流。因此，在旧文件行上的循环被执行0次。

使用python>=3.6？进行

open(f"{bookname}.txt", 'w') as newfile

但是，为了覆盖文件。我猜你不能那样做。你打开同一个文件在一个声明中进行阅读和写作。

相关内容

最新更新

热门标签：