正则表达式在包装模式上拆分



所以老实说,我只是被难住了,目标是在包装器上拆分,但如果它在被包装的东西中,则不是同一个包装器。

取以下字符串:

s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'

['something','{','now I am wrapped {I should not cause splitting} I am still wrapped','}','something else']结果列表

我尝试的最简单的方法是findall看看它是如何工作的,但由于正则表达式没有内存,它不考虑包装,所以一旦找到另一个结束括号就会结束。事情是这样的:

>>> s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
>>> re.findall(r'{.*?}',s)
['{now I am wrapped {I should not cause splitting}']

关于我如何让它识别不识别它是否是内部包装的一部分的任何想法?

s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
m = re.search(r'(.*)({)(.*?{.*?}.*?)(})(.*)', s)
print m.groups()

新答案:

s = 'something{now I am wrapped {I should {not cause} splitting} I am still wrapped}something else'
m = re.search(r'([^{]*)({)(.*)(})([^}]*)', s)
print m.groups()
不确定

这是否总是可以满足您的需求,但是您可以使用partitionrpartition,例如:

In [26]: s_1 = s.partition('{')
In [27]: s_1
Out[27]: 
('something',
 '{',
 'now I am wrapped {I should not cause splitting} I am still wrapped}something else')
In [30]: s_2 = s_1[-1].rpartition('}')
In [31]: s_2
Out[31]: 
('now I am wrapped {I should not cause splitting} I am still wrapped',
 '}',
 'something else')
In [34]: s_out = s_1[0:-1] + s_2
In [35]: s_out
Out[35]: 
('something',
 '{',
 'now I am wrapped {I should not cause splitting} I am still wrapped',
 '}',
 'something else')

基于所有响应,我决定只编写一个函数,该函数获取字符串和包装器并使用暴力迭代输出列表:

def f(string,wrap1,wrap2):
    wrapped = False
    inner = 0
    count = 0
    holds = ['']
    for i,c in enumerate(string):
        if c == wrap1 and not wrapped:
            count += 2
            wrapped = True
            holds.append(wrap1)
            holds.append('')
        elif c == wrap1 and wrapped:
            inner += 1
            holds[count] += c
        elif c == wrap2 and wrapped and inner > 0:
            inner -= 1
            holds[count] += c
        elif c == wrap2 and wrapped and inner == 0:
            wrapped = False
            count += 2
            holds.append(wrap2)
            holds.append('')
        else:
            holds[count] += c
    return holds

现在这表明它正在工作:

>>> s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
>>> f(s,'{','}')
['something', '{', 'now I am wrapped {I should not cause splitting} I am still wrapped', '}', 'something else']
您可以使用

re模块的Scanner来解决此问题:

使用以下字符串列表作为测试:

l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
     'something{now I am wrapped} here {and there} listen',
     'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
     'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']

创建一个类,除了它们两边之间的文本外,我将在其中保留打开和关闭大括号的数量的状态。它有三种方法,一种用于匹配左大括号,另一种用于右大括号,最后一种方法用于两者之间的文本。取决于堆栈(opened_cb变量)是否为空,我执行不同的操作:

class Cb():
    def __init__(self, results=None):
        self.results = []
        self.opened_cb = 0
    def s_text_until_cb(self, scanner, token):
        if self.opened_cb == 0:
            return token
        else:
            self.results.append(token)
            return None
    def s_opening_cb(self, scanner, token):
        self.opened_cb += 1
        if self.opened_cb == 1:
            return token
        self.results.append(token)
        return None
    def s_closing_cb(self, scanner, token):
        self.opened_cb -= 1
        if self.opened_cb == 0:
            t = [''.join(self.results), token]
            self.results.clear()
            return t
        else:
            self.results.append(token)
            return None

最后,我创建Scanner并将结果联接在一个普通列表中:

for s in l:
    results = []
    cb = Cb()
    scanner = re.Scanner([
        (r'[^{}]+', cb.s_text_until_cb),
        (r'[{]', cb.s_opening_cb),
        (r'[}]', cb.s_closing_cb),
    ])
    r = scanner.scan(s)[0]
    for elem in r:
        if isinstance(elem, list):
            results.extend(elem)
        else:
            results.append(elem)
    print('Original string --> {0}nResult --> {1}nn'.format(s, results))

这里查看完整的程序和执行结果:

import re
l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
     'something{now I am wrapped} here {and there} listen',
     'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
     'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']

class Cb():
    def __init__(self, results=None):
        self.results = []
        self.opened_cb = 0
    def s_text_until_cb(self, scanner, token):
        if self.opened_cb == 0:
            return token
        else:
            self.results.append(token)
            return None
    def s_opening_cb(self, scanner, token):
        self.opened_cb += 1
        if self.opened_cb == 1:
            return token
        return None
    def s_closing_cb(self, scanner, token):
        self.opened_cb -= 1
        if self.opened_cb == 0:
            t = [''.join(self.results), token]
            self.results.clear()
            return t
        else:
            self.results.append(token)
            return None
for s in l:
    results = []
    cb = Cb()
    scanner = re.Scanner([
        (r'[^{}]+', cb.s_text_until_cb),
        (r'[{]', cb.s_opening_cb),
        (r'[}]', cb.s_closing_cb),
    ])
    r = scanner.scan(s)[0]
    for elem in r:
        if isinstance(elem, list):
            results.extend(elem)
        else:
            results.append(elem)
    print('Original string --> {0}nResult --> {1}nn'.format(s, results))

像这样运行它:

python3 script.py

这会产生:

Original string --> something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else
Result --> ['something', '{', 'now I am wrapped {I should not cause splitting} I am still wrapped', '}', 'everything else']

Original string --> something{now I am wrapped} here {and there} listen
Result --> ['something', '{', 'now I am wrapped', '}', ' here ', '{', 'and there', '}', ' listen']

Original string --> something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now I am wrapped {I should {not} cause splitting} I am still wrapped', '}', 'everything']

Original string --> something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped', '}', 'everything']

最新更新