如何用BeautifulSoup将一个新标签包裹在多个标签周围



在下面的示例中,我试图将一个<content>标记包裹在一个节中的所有<p>标记周围。每个部分都在<item>内,但<title>需要位于<content>之外。我该怎么做?

源文件:

<item>
<title>Heading for Sec 1</title>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
</item>
<item>
<title>Heading for Sec 2</title>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</item>
<item>
<title>Heading for Sec 3</title>
<p>some text sec 3</p>
<p>some text sec 3</p>
</item>

我想要这个输出:

<item>
<title>Heading for Sec 1</title>
<content>
<p>some text sec 1</p>
<p>some text sec 1</p>
</content>
</item>
<item>
<title>Heading for Sec 2</title>
<content>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</content>
</item>
<item>
<title>Heading for Sec 3</title>
<content>
<p>some text sec 3</p>
<p>some text sec 3</p>
</content>
</item>

下面的代码是我正在尝试的。但是,它将一个<content>标记包裹在每个<p>标记周围,而不是包裹在一个节中的所有<p>标记周围。我该怎么解决这个问题?

from bs4 import BeautifulSoup
with open('testdoc.txt', 'r') as f:
soup = BeautifulSoup(f, "html.parser")
content = None
for tag in soup.select("p"):  
if tag.name == "p":
content = tag.wrap(soup.new_tag("content"))
content.append(tag)
continue
print(soup)

尝试:

from bs4 import BeautifulSoup
html_doc = """
<item>
<title>Heading for Sec 1</title>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
</item>
<item>
<title>Heading for Sec 2</title>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</item>
<item>
<title>Heading for Sec 3</title>
<p>some text sec 3</p>
<p>some text sec 3</p>
</item>"""

soup = BeautifulSoup(html_doc, "html.parser")
for item in soup.select("item"):
t = soup.new_tag("content")
t.append("n")
item.title.insert_after(t)
item.title.insert_after("n")
for p in item.select("p"):
t.append(p)
t.append("n")
item.smooth()
for t in item.find_all(text=True, recursive=False):
t.replace_with("n")
print(soup)

打印:

<item>
<title>Heading for Sec 1</title>
<content>
<p>some text sec 1</p>
<p>some text sec 1</p>
<p>some text sec 1</p>
</content>
</item>
<item>
<title>Heading for Sec 2</title>
<content>
<p>some text sec 2</p>
<p>some text sec 2</p>
<p>some text sec 2</p>
</content>
</item>
<item>
<title>Heading for Sec 3</title>
<content>
<p>some text sec 3</p>
<p>some text sec 3</p>
</content>
</item>

最新更新