我有一个字符串:
<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>, <div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>, <div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>, <div class="options mceEditable">The proteins may either be carriers or receptors only</div>, <div class="options mceEditable">It is a 3-layered lipid structure</div>
我想用逗号分割上面的字符串,条件是>,<或者,>
期望输出:
['<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>',
'<div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>',
'<div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>',
'<div class="options mceEditable">The proteins may either be carriers or receptors only</div>',
'<div class="options mceEditable">It is a 3-layered lipid structure</div>']
What I tried:
options = test3.split(">, <")
options=options.replace("</div'","</div>'")
以上两种方法均未产生结果。有人能帮忙吗?
通常我不会建议任何与XML/HTML相关的正则表达式,但由于您的输入是一些处理过的形式,不再有效,我想说在这种情况下使用正则表达式是可以接受的,如果您不能在数据源处修复它:
import re
s = '<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>, <div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>, <div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>, <div class="options mceEditable">The proteins may either be carriers or receptors only</div>, <div class="options mceEditable">It is a 3-layered lipid structure</div>'
pattern = r'<div class="options mceEditable">.*?</div>'
matches = re.findall(pattern, s, re.U)
for m in matches:
print(m)
输出:
<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>
<div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>
<div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>
<div class="options mceEditable">The proteins may either be carriers or receptors only</div>
<div class="options mceEditable">It is a 3-layered lipid structure</div>
您可以使用BeautifulSoup
:
# pip install bs4
import bs4
soup = bs4.BeautifulSoup(s)
divs = [str(div) for div in soup.find_all('div')]
输出:
>>> divs
['<div class="options mceEditable">The membrane is a dynamic structure, and its constituents are in constant movement.</div>',
'<div class="options mceEditable">The lipids component of the membrane constitutes a bilayer of hydrophilic ends</div>',
'<div class="options mceEditable">The lipid content of the membrane is more than that of the protein</div>',
'<div class="options mceEditable">The proteins may either be carriers or receptors only</div>',
'<div class="options mceEditable">It is a 3-layered lipid structure</div>']
count = text.count("</div>")
text = text.split("</div>,")
m = 1
for i in text :
if m < count :
print(i, end= "</div>," + 'n')
m = m + 1
else :
print(i, end = 'n')