由逗号分隔的 Python 字符串，仅出现在两个特定字符之间，><

我有一个字符串:

<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>, <div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>, <div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>, <div class="options mceEditable">The proteins may either be carriers or receptors only</div>, <div class="options mceEditable">It is a 3-layered lipid structure</div>

我想用逗号分割上面的字符串，条件是>，<或者，>

期望输出:

['<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>',
'<div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>',
'<div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>',
'<div class="options mceEditable">The proteins may either be carriers or receptors only</div>',
'<div class="options mceEditable">It is a 3-layered lipid structure</div>']

What I tried:

options = test3.split(">, <")
options=options.replace("</div'","</div>'")

以上两种方法均未产生结果。有人能帮忙吗?

通常我不会建议任何与XML/HTML相关的正则表达式，但由于您的输入是一些处理过的形式，不再有效，我想说在这种情况下使用正则表达式是可以接受的，如果您不能在数据源处修复它:

import re
s = '<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>, <div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>, <div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>, <div class="options mceEditable">The proteins may either be carriers or receptors only</div>, <div class="options mceEditable">It is a 3-layered lipid structure</div>'  
pattern = r'<div class="options mceEditable">.*?</div>'
matches = re.findall(pattern, s, re.U)
for m in matches:
print(m)

输出:

<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>
<div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>
<div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>
<div class="options mceEditable">The proteins may either be carriers or receptors only</div>
<div class="options mceEditable">It is a 3-layered lipid structure</div>

您可以使用BeautifulSoup:

# pip install bs4
import bs4
soup = bs4.BeautifulSoup(s)
divs = [str(div) for div in soup.find_all('div')]

输出:

>>> divs
['<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>',
'<div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>',
'<div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>',
'<div class="options mceEditable">The proteins may either be carriers or receptors only</div>',
'<div class="options mceEditable">It is a 3-layered lipid structure</div>']

count = text.count("</div>")
text = text.split("</div>,")
m = 1
for i in text :
if m < count : 
print(i, end= "</div>," + 'n')
m = m + 1
else :
print(i, end = 'n')

相关内容

最新更新

热门标签：