所以老实说,我只是被难住了,目标是在包装器上拆分,但如果它在被包装的东西中,则不是同一个包装器。
取以下字符串:
s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
应['something','{','now I am wrapped {I should not cause splitting} I am still wrapped','}','something else']
结果列表
我尝试的最简单的方法是findall
看看它是如何工作的,但由于正则表达式没有内存,它不考虑包装,所以一旦找到另一个结束括号就会结束。事情是这样的:
>>> s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
>>> re.findall(r'{.*?}',s)
['{now I am wrapped {I should not cause splitting}']
关于我如何让它识别不识别它是否是内部包装的一部分的任何想法?
s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
m = re.search(r'(.*)({)(.*?{.*?}.*?)(})(.*)', s)
print m.groups()
新答案:
s = 'something{now I am wrapped {I should {not cause} splitting} I am still wrapped}something else'
m = re.search(r'([^{]*)({)(.*)(})([^}]*)', s)
print m.groups()
这是否总是可以满足您的需求,但是您可以使用partition
和rpartition
,例如:
In [26]: s_1 = s.partition('{')
In [27]: s_1
Out[27]:
('something',
'{',
'now I am wrapped {I should not cause splitting} I am still wrapped}something else')
In [30]: s_2 = s_1[-1].rpartition('}')
In [31]: s_2
Out[31]:
('now I am wrapped {I should not cause splitting} I am still wrapped',
'}',
'something else')
In [34]: s_out = s_1[0:-1] + s_2
In [35]: s_out
Out[35]:
('something',
'{',
'now I am wrapped {I should not cause splitting} I am still wrapped',
'}',
'something else')
基于所有响应,我决定只编写一个函数,该函数获取字符串和包装器并使用暴力迭代输出列表:
def f(string,wrap1,wrap2):
wrapped = False
inner = 0
count = 0
holds = ['']
for i,c in enumerate(string):
if c == wrap1 and not wrapped:
count += 2
wrapped = True
holds.append(wrap1)
holds.append('')
elif c == wrap1 and wrapped:
inner += 1
holds[count] += c
elif c == wrap2 and wrapped and inner > 0:
inner -= 1
holds[count] += c
elif c == wrap2 and wrapped and inner == 0:
wrapped = False
count += 2
holds.append(wrap2)
holds.append('')
else:
holds[count] += c
return holds
现在这表明它正在工作:
>>> s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
>>> f(s,'{','}')
['something', '{', 'now I am wrapped {I should not cause splitting} I am still wrapped', '}', 'something else']
re
模块的Scanner
来解决此问题:
使用以下字符串列表作为测试:
l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
'something{now I am wrapped} here {and there} listen',
'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']
创建一个类,除了它们两边之间的文本外,我将在其中保留打开和关闭大括号的数量的状态。它有三种方法,一种用于匹配左大括号,另一种用于右大括号,最后一种方法用于两者之间的文本。取决于堆栈(opened_cb
变量)是否为空,我执行不同的操作:
class Cb():
def __init__(self, results=None):
self.results = []
self.opened_cb = 0
def s_text_until_cb(self, scanner, token):
if self.opened_cb == 0:
return token
else:
self.results.append(token)
return None
def s_opening_cb(self, scanner, token):
self.opened_cb += 1
if self.opened_cb == 1:
return token
self.results.append(token)
return None
def s_closing_cb(self, scanner, token):
self.opened_cb -= 1
if self.opened_cb == 0:
t = [''.join(self.results), token]
self.results.clear()
return t
else:
self.results.append(token)
return None
最后,我创建Scanner
并将结果联接在一个普通列表中:
for s in l:
results = []
cb = Cb()
scanner = re.Scanner([
(r'[^{}]+', cb.s_text_until_cb),
(r'[{]', cb.s_opening_cb),
(r'[}]', cb.s_closing_cb),
])
r = scanner.scan(s)[0]
for elem in r:
if isinstance(elem, list):
results.extend(elem)
else:
results.append(elem)
print('Original string --> {0}nResult --> {1}nn'.format(s, results))
这里查看完整的程序和执行结果:
import re
l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
'something{now I am wrapped} here {and there} listen',
'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']
class Cb():
def __init__(self, results=None):
self.results = []
self.opened_cb = 0
def s_text_until_cb(self, scanner, token):
if self.opened_cb == 0:
return token
else:
self.results.append(token)
return None
def s_opening_cb(self, scanner, token):
self.opened_cb += 1
if self.opened_cb == 1:
return token
return None
def s_closing_cb(self, scanner, token):
self.opened_cb -= 1
if self.opened_cb == 0:
t = [''.join(self.results), token]
self.results.clear()
return t
else:
self.results.append(token)
return None
for s in l:
results = []
cb = Cb()
scanner = re.Scanner([
(r'[^{}]+', cb.s_text_until_cb),
(r'[{]', cb.s_opening_cb),
(r'[}]', cb.s_closing_cb),
])
r = scanner.scan(s)[0]
for elem in r:
if isinstance(elem, list):
results.extend(elem)
else:
results.append(elem)
print('Original string --> {0}nResult --> {1}nn'.format(s, results))
像这样运行它:
python3 script.py
这会产生:
Original string --> something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else
Result --> ['something', '{', 'now I am wrapped {I should not cause splitting} I am still wrapped', '}', 'everything else']
Original string --> something{now I am wrapped} here {and there} listen
Result --> ['something', '{', 'now I am wrapped', '}', ' here ', '{', 'and there', '}', ' listen']
Original string --> something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now I am wrapped {I should {not} cause splitting} I am still wrapped', '}', 'everything']
Original string --> something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped', '}', 'everything']