美丽的汤用<代替<

我找到了要替换的文本，但是当我打印soup时，格式发生了变化。<div id="content">stuff here</div>变得<div id="content">stuff here</div>.如何保留数据？我已经尝试过print(soup.encode(formatter="none"))，但这会产生相同的错误格式。

from bs4 import BeautifulSoup
with open(index_file) as fp:
soup = BeautifulSoup(fp,"html.parser")
found = soup.find("div", {"id": "content"})
found.replace_with(data)

当我打印found时，我得到正确的格式：

>>> print(found)
<div id="content">stuff</div>

index_file内容如下：

<!DOCTYPE html>
<head>
Apples 
</head>
<body>
<div id="page">
This is the Id of the page
<div id="main">
<div id="content">
stuff here
</div>
</div>
footer should go here
</div>
</body>
</html>

found对象不是 Python 字符串，它是一个恰好有一个很好的字符串表示形式的Tag。您可以通过执行以下操作来验证这一点

type(found)

Tag是 Beautiful Soup 创建的对象层次结构的一部分，以便您能够与 HTML 进行交互。另一个这样的对象是NavigableString.NavigableString很像一个字符串，但它只能包含进入HTML内容部分的内容。

当你这样做时

found.replace_with('<div id="content">stuff here</div>')

您要求将Tag替换为包含该文字文本的NavigableString。HTML能够显示该字符串的唯一方法是转义所有尖括号，就像它所做的那样。

与其一团糟，你可能想保留你的Tag，只替换它的内容：

found.string.replace_with('stuff here')

请注意，正确的替换不会尝试覆盖标记。

当您执行found.replace_with(...)时，名称found引用的对象将在父层次结构中被替换。但是，名称found一直指向与以前相同的过时对象。这就是为什么打印soup显示更新，但打印found不显示。

相关内容

最新更新

热门标签：