如何删除BeautifulSoup中的父元素

给定此html结构

<strong><a href="https://www.fertilizer.com/2021/07/bvfcl.html" target="_blank">Fertilizer Corporation Limited</a> (BVFCL)</strong> has released an employment notification for the recruitment of <strong>11 DGM, Company Secretary, Finance Manager and Accounts Officer Vacancy</strong>

如果html结构中有fertilizer.com，我需要删除整个元素/标签

因此，最终结果应该是：

null

我了解到bs4中有一个decompose()方法来删除元素，但如何对父元素进行删除，如何导航到它。

请引导我。谢谢

给定唯一提供的HTML片段，这将是我的解决方案

从bs4进口BeautifulSoup

txt = '''
<strong>
<a href="https://www.fertilizer.com/2021/07/bvfcl.html" target="_blank">Fertilizer Corporation Limited</a> (BVFCL)
</strong> 
has released an employment notification for the recruitment of 
<strong>11 DGM, Company Secretary, Finance Manager and Accounts Officer Vacancy
</strong> 
'''
soup = BeautifulSoup(txt, 'html.parser')
print(f'Content Before decomposition:n{soup}')
target = "www.fertilizer.com"
hrefs = [link['href'] for link in soup.find_all('a', href=True) if target in link['href']]
print(hrefs) # ['https://www.fertilizer.com/2021/07/bvfcl.html']
if hrefs: # Means we found it
soup.decompose()
print(f'Content After decomposition:n{soup}')
# <None></None>

另一种解决方案是，如果你只想一无所获，那么如下所示；注意，第二个循环是删除未包含在特定标签中的自由文本

from bs4 import BeautifulSoup

txt = '''
<strong>
<a href="https://www.fertilizer.com/2021/07/bvfcl.html" target="_blank">Fertilizer Corporation Limited</a> (BVFCL)
</strong> 
has released an employment notification for the recruitment of 
<strong>11 DGM, Company Secretary, Finance Manager and Accounts Officer Vacancy
</strong> 
'''
soup = BeautifulSoup(txt, 'html.parser')
print(f'Content Before decomposition:n{soup}')
target = "www.fertilizer.com"
hrefs = [link['href'] for link in soup.find_all('a', href=True) if target in link['href']]
print(hrefs) # ['https://www.fertilizer.com/2021/07/bvfcl.html']
if hrefs: # Means we found it
# Handles tags
for el in soup.find_all():
el.replaceWith("")
# Handles free text like: 'has released an employment notification for the recruitment of ' (bevause is not in a particular tag) 
for el in soup.find_all(text=True):
el.replaceWith("")
print(f'Content After decomposition:n{soup}')

相关文档

相关内容

最新更新

热门标签：