如何修改保存的HTML页面?



我正在尝试修改保存的html网页。更具体地说,我想突出显示页面中的特定句子,并保存为新的html页面。

我认为下面的代码可以工作,但它没有

import re
#download https://en.wikipedia.org/wiki/HTML to disk using chrome / save as complete html
with open(r"C:UsersDownloadswebpage.html", mode='rt', encoding='utf-8') as f:
mytext = f.read()
#highlight "The HyperText Markup Language, or HTML" in red
re.sub("The HyperText Markup Language, or HTML", mytext,
'<span style="color: red">{}</span>'.format(r'/1'))
mytext.write(r"C:UsersDownloadswebpage_modif.html")
File "<ipython-input-9-f7f9195da80f>", line 5, in <module>
mytext.write(r"C:UsersDownloadswebpage_modif.html")
AttributeError: 'str' object has no attribute 'write'

任何想法?谢谢!

下面是如何打开html文件,使用bs4编辑并写入新文件。我假设您正在尝试将style属性添加到span标签:

import requests
from bs4 import BeautifulSoup
from xml.sax.saxutils import unescape
url = 'https://en.wikipedia.org/wiki/HTML'
res = requests.get(url).content
soup = BeautifulSoup(res, 'html.parser')
text_to_be_highlighted = "The HyperText Markup Language, or HTML"
highlighed_text = f'<span style="color: red">{text_to_be_highlighted}</span>'

# grab all tags with specified text
tags = [tag for tag in soup.find_all(lambda tag: text_to_be_highlighted in tag.text)]
for tag in tags:
new_text = tag.text.replace(text_to_be_highlighted, highlighed_text)
tag.string = new_text

with open("new.html", "w", encoding = 'utf-8') as f:
f.write(unescape(soup.prettify()))

解释:使用帮助find_all方法和lambda函数抓取包含指定文本的所有标签。获取整个文本,并用突出显示该文本的新标记替换指定的文本。最后,将修改后的soup写入新文件

最新更新