Python字符串编辑



My Initial String由<span></span></span之间的一些内容组成,我想从我的字符串中删除那块(包括span和它里面的内容和/span),我该怎么办?

需要删除的字符串部分:"<span class="_5mfr"><span class="_6qdm" style='height: 16px; width: 16px; font-size: 16px; background-image: url("https://static.xx.fbcdn.net/images/emoji.php/v9/t81/1/16/")+14个变量字符串+</span></span

我想删除上面提到的整个片段

import re
txt = 'Iam a good boy <span>some blahblahblah </span</span and my name is john'
print(re.sub(r'<span>.*</span</span ', '', txt))

打印:

Iam a good boy and my name is john

到更新后的问题

import re
txt = """<span class="_5mfr"><span class="_6qdm" style='height: 16px; width: 16px; font-size: 16px; background-image: url("https://static.xx.fbcdn.net/images/emoji.php/v9/t81/1/16/")+14 variable strings+</span></span"""
print(re.sub(r'<span [^<>]*?</span>?</span', '', txt))
# prints: <span class="_5mfr">

使用BeautifulSoup:

from bs4 import BeautifulSoup
soup = BeautifulSoup(string, 'html.parser')
for x in soup.findAll('span'):
x.replace_with('')
print(soup.string)

您可以按照如下所示替换正则表达式找到的所有内容:

import re
regex = r"(<span.+?>)|(</span>)"
test_str = "<span class=\"_5mfr\"><span class=\"_6qdm\" style='height: 16px; width: 16px; font-size: 16px; background-image: url(\"static.xx.fbcdn.net/images/emoji.php/v9/t81/1/16/…\")'>© Dasamoolam Damu (Troll Malayalam)ഹൗ ക്രൂവൽ<span class=\"_5mfr\"><span class=\"_6qdm\" style='height: 16px; width: 16px; font-size: 16px; background-image: url(\"static.xx.fbcdn.net/images/emoji.php/v9/td7/1/16/…\")'></span></span></span></span>"
print(re.sub(regex, '', test_str))

最新更新