如何在现有HTML内容中附加HTML内容



我有旧的HTML

<h1>Health Authority Updates</h1><h2>North America</h2><h3><a id="_US_guidances/regulations"></a>US
guidances/regulations</h3>
<ol>
<li>Final Guidance: 25-May-2021: <a
href="https://www.fda.gov/regulatory-information/search-fda-guidance-documents/emergency-use-authorization-vaccines-prevent-covid-19">Emergency
Use Authorization for Vaccines to Prevent COVID-19: Guidance for Industry</a>
<ol>
<li>abc</li>
<li>def</li>
</ol>
</li>
</ol><h2>Asia-Pacific </h2><h3><a id="_Australia_guidances/regulations"></a>Australia guidances/regulations</h3>
<ol>
<li>Guidance: 04-Sep-2020: <a href="https://www.cortellis.com/intelligence/report/ri/regulatory/238041">Cortellis
Report on In Vitro Diagnostics Regulatory Framework</a>
<ol>
<li>This Regulatory Summary is related to specific Regulation for In Vitro Diagnostics in Australia. It
provides definitions and outlines legal framework from different points of view (manufacturers,
importers and distributors). It gives information about Registration procedures, provides practical help
on how to obtain its notification. This document also contains detailed information about fees, clinical
trials, post-marketing vigilance system, labeling, pricing and reimbursement and advertising.
</li>
<li>Content Update on <strong>04-Sep-2020</strong>:
<ol>
<li>One</li>
<li>Two</li>
<li>three</li>
</ol>
</li>
</ol>
</li>
</ol>

这是新的HTML:

<h2>North America</h2><h3>US guidances/regulations</h3>
<ol>
<li>2021-06-22:<a href=http://www.minsa.gob.pa/noticia/arranca-esperado-proceso-de-vacunacion-en-chiriqui> Emergency
Use Authorization for Vaccines to Prevent weweCOVID-19: Guidance for Industry 22</a>
<ol>
<li> first list</li>
<li> Second</li>
</ol>
</li>
</ol><h2>Asia Pacific</h2><h3>Australia guidances/regulations</h3>
<ol>
<li>2021-06-22:<a href=http://www.minsa.gob.pa/noticia/arranca-esperado-proceso-de-vacunacion-en-chiriqui> Emergency
Use Authorization for Vaccines to Prevent weweCOVID-19: Guidance for Industry 22</a>
<ol>
<li> first list</li>
<li> Second</li>
</ol>
</li>
</ol>

我需要在第一个HTML的美国指南/法规的开头,在第二个HTML中添加美国指南和法规下的内容,澳大利亚也是如此。以下是我的代码:

soup1 = BeautifulSoup(html_string, "html.parser")
soup2 = BeautifulSoup(html_string_new, "html.parser")
for li in soup2.select("h3 + ol > li"):
h3_text = li.find_previous("h3").get_text(strip=True)
h3_soup1 = soup1.find("h3")
if not h3_soup1:
continue
h3_soup1.find_next("ol").insert(0, li)

问题是,它把所有东西都插入了美国的

<h1>Health Authority Updates</h1><h2>North America</h2><h3><a id="_US_guidances/regulations"></a>US
guidances/regulations</h3>
<ol>
<li>2021-06-22:<a href="http://www.minsa.gob.pa/noticia/arranca-esperado-proceso-de-vacunacion-en-chiriqui">
Emergency Use Authorization for Vaccines to Prevent weweCOVID-19: Guidance for Industry 22</a>
<ol>
<li> first list</li>
<li> Second</li>
</ol>
</li>
<li>2021-06-22:<a href="http://www.minsa.gob.pa/noticia/arranca-esperado-proceso-de-vacunacion-en-chiriqui">
Emergency Use Authorization for Vaccines to Prevent weweCOVID-19: Guidance for Industry 22</a>
<ol>
<li> first list</li>
<li> Second</li>
</ol>
</li>
<li>Final Guidance: 25-May-2021: <a
href="https://www.fda.gov/regulatory-information/search-fda-guidance-documents/emergency-use-authorization-vaccines-prevent-covid-19">Emergency
Use Authorization for Vaccines to Prevent COVID-19: Guidance for Industry</a>
<ol>
<li>abc</li>
<li>def</li>
</ol>
</li>
</ol><h2>Asia-Pacific </h2><h3><a id="_Australia_guidances/regulations"></a>Australia guidances/regulations</h3>
<ol>
<li>Guidance: 04-Sep-2020: <a href="https://www.cortellis.com/intelligence/report/ri/regulatory/238041">Cortellis
Report on In Vitro Diagnostics Regulatory Framework</a>
<ol>
<li>This Regulatory Summary is related to specific Regulation for In Vitro Diagnostics in Australia. It
provides definitions and outlines legal framework from different points of view (manufacturers,
importers and distributors). It gives information about Registration procedures, provides practical help
on how to obtain its notification. This document also contains detailed information about fees, clinical
trials, post-marketing vigilance system, labeling, pricing and reimbursement and advertising.
</li>
<li>Content Update on <strong>04-Sep-2020</strong>:
<ol>
<li>One</li>
<li>Two</li>
<li>three</li>
</ol>
</li>
</ol>
</li>
</ol>

我尝试用这个h3_soup1 = soup1.find("h3", text = h3_text)替换这个h3_soup1 = soup1.find("h3"),但它返回None

编辑:

预期输出:

<h1>Health Authority Updates</h1><h2>North America</h2><h3><a id="_US_guidances/regulations"></a>US
guidances/regulations</h3>
<ol>
<li>2021-06-22:<a href="http://www.minsa.gob.pa/noticia/arranca-esperado-proceso-de-vacunacion-en-chiriqui">
Emergency Use Authorization for Vaccines to Prevent weweCOVID-19: Guidance for Industry 22</a>
<ol>
<li> first list</li>
<li> Second</li>
</ol>
</li>
<li>Final Guidance: 25-May-2021: <a
href="https://www.fda.gov/regulatory-information/search-fda-guidance-documents/emergency-use-authorization-vaccines-prevent-covid-19">Emergency
Use Authorization for Vaccines to Prevent COVID-19: Guidance for Industry</a>
<ol>
<li>abc</li>
<li>def</li>
</ol>
</li>
</ol><h2>Asia-Pacific </h2><h3><a id="_Australia_guidances/regulations"></a>Australia guidances/regulations</h3>
<ol>
<li>2021-06-22:<a href="http://www.minsa.gob.pa/noticia/arranca-esperado-proceso-de-vacunacion-en-chiriqui">
Emergency Use Authorization for Vaccines to Prevent weweCOVID-19: Guidance for Industry 22</a>
<ol>
<li> first list</li>
<li> Second</li>
</ol>
</li>
<li>Guidance: 04-Sep-2020: <a href="https://www.cortellis.com/intelligence/report/ri/regulatory/238041">Cortellis
Report on In Vitro Diagnostics Regulatory Framework</a>
<ol>
<li>This Regulatory Summary is related to specific Regulation for In Vitro Diagnostics in Australia. It
provides definitions and outlines legal framework from different points of view (manufacturers,
importers and distributors). It gives information about Registration procedures, provides practical help
on how to obtain its notification. This document also contains detailed information about fees, clinical
trials, post-marketing vigilance system, labeling, pricing and reimbursement and advertising.
</li>
<li>Content Update on <strong>04-Sep-2020</strong>:
<ol>
<li>One</li>
<li>Two</li>
<li>three</li>
</ol>
</li>
</ol>
</li>
</ol>

尝试:

import re
soup1 = BeautifulSoup(html_string, "html.parser")
soup2 = BeautifulSoup(html_string_new, "html.parser")

def fn(txt, tag):
if tag.name != "h3":
return
t = re.sub(r"s{2,}", " ", tag.get_text(strip=True))
return txt in t

for li in soup2.select("h3 + ol > li"):
h3_text = li.find_previous("h3").get_text(strip=True)
h3_soup1 = soup1.find(lambda t: fn(h3_text, t))
if not h3_soup1:
continue
h3_soup1.find_next("ol").insert(0, li)

print(soup1)

打印:


<h1>Health Authority Updates</h1><h2>North America</h2><h3><a id="_US_guidances/regulations"></a>US
guidances/regulations</h3>
<ol><li>2021-06-22:<a href="http://www.minsa.gob.pa/noticia/arranca-esperado-proceso-de-vacunacion-en-chiriqui"> Emergency
Use Authorization for Vaccines to Prevent weweCOVID-19: Guidance for Industry 22</a>
<ol>
<li> first list</li>
<li> Second</li>
</ol>
</li>
<li>Final Guidance: 25-May-2021: <a href="https://www.fda.gov/regulatory-information/search-fda-guidance-documents/emergency-use-authorization-vaccines-prevent-covid-19">Emergency
Use Authorization for Vaccines to Prevent COVID-19: Guidance for Industry</a>
<ol>
<li>abc</li>
<li>def</li>
</ol>
</li>
</ol><h2>Asia-Pacific </h2><h3><a id="_Australia_guidances/regulations"></a>Australia guidances/regulations</h3>
<ol><li>2021-06-22:<a href="http://www.minsa.gob.pa/noticia/arranca-esperado-proceso-de-vacunacion-en-chiriqui"> Emergency
Use Authorization for Vaccines to Prevent weweCOVID-19: Guidance for Industry 22</a>
<ol>
<li> first list</li>
<li> Second</li>
</ol>
</li>
<li>Guidance: 04-Sep-2020: <a href="https://www.cortellis.com/intelligence/report/ri/regulatory/238041">Cortellis
Report on In Vitro Diagnostics Regulatory Framework</a>
<ol>
<li>This Regulatory Summary is related to specific Regulation for In Vitro Diagnostics in Australia. It
provides definitions and outlines legal framework from different points of view (manufacturers,
importers and distributors). It gives information about Registration procedures, provides practical help
on how to obtain its notification. This document also contains detailed information about fees, clinical
trials, post-marketing vigilance system, labeling, pricing and reimbursement and advertising.
</li>
<li>Content Update on <strong>04-Sep-2020</strong>:
<ol>
<li>One</li>
<li>Two</li>
<li>three</li>
</ol>
</li>
</ol>
</li>
</ol>

最新更新