正则表达式在包装模式上拆分

所以老实说，我只是被难住了，目标是在包装器上拆分，但如果它在被包装的东西中，则不是同一个包装器。

取以下字符串：

s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'

应['something','{','now I am wrapped {I should not cause splitting} I am still wrapped','}','something else']结果列表

我尝试的最简单的方法是findall看看它是如何工作的，但由于正则表达式没有内存，它不考虑包装，所以一旦找到另一个结束括号就会结束。事情是这样的：

>>> s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
>>> re.findall(r'{.*?}',s)
['{now I am wrapped {I should not cause splitting}']

关于我如何让它识别不识别它是否是内部包装的一部分的任何想法？

s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
m = re.search(r'(.*)({)(.*?{.*?}.*?)(})(.*)', s)
print m.groups()

新答案：

s = 'something{now I am wrapped {I should {not cause} splitting} I am still wrapped}something else'
m = re.search(r'([^{]*)({)(.*)(})([^}]*)', s)
print m.groups()

不确定

这是否总是可以满足您的需求，但是您可以使用partition和rpartition，例如：

In [26]: s_1 = s.partition('{')
In [27]: s_1
Out[27]: 
('something',
 '{',
 'now I am wrapped {I should not cause splitting} I am still wrapped}something else')
In [30]: s_2 = s_1[-1].rpartition('}')
In [31]: s_2
Out[31]: 
('now I am wrapped {I should not cause splitting} I am still wrapped',
 '}',
 'something else')
In [34]: s_out = s_1[0:-1] + s_2
In [35]: s_out
Out[35]: 
('something',
 '{',
 'now I am wrapped {I should not cause splitting} I am still wrapped',
 '}',
 'something else')

基于所有响应，我决定只编写一个函数，该函数获取字符串和包装器并使用暴力迭代输出列表：

def f(string,wrap1,wrap2):
    wrapped = False
    inner = 0
    count = 0
    holds = ['']
    for i,c in enumerate(string):
        if c == wrap1 and not wrapped:
            count += 2
            wrapped = True
            holds.append(wrap1)
            holds.append('')
        elif c == wrap1 and wrapped:
            inner += 1
            holds[count] += c
        elif c == wrap2 and wrapped and inner > 0:
            inner -= 1
            holds[count] += c
        elif c == wrap2 and wrapped and inner == 0:
            wrapped = False
            count += 2
            holds.append(wrap2)
            holds.append('')
        else:
            holds[count] += c
    return holds

现在这表明它正在工作：

>>> s = 'something{now I am wrapped {I should not cause splitting} I am still wrapped}something else'
>>> f(s,'{','}')
['something', '{', 'now I am wrapped {I should not cause splitting} I am still wrapped', '}', 'something else']

您可以使用

re模块的Scanner来解决此问题：

使用以下字符串列表作为测试：

l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
     'something{now I am wrapped} here {and there} listen',
     'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
     'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']

创建一个类，除了它们两边之间的文本外，我将在其中保留打开和关闭大括号的数量的状态。它有三种方法，一种用于匹配左大括号，另一种用于右大括号，最后一种方法用于两者之间的文本。取决于堆栈（opened_cb变量）是否为空，我执行不同的操作：

class Cb():
    def __init__(self, results=None):
        self.results = []
        self.opened_cb = 0
    def s_text_until_cb(self, scanner, token):
        if self.opened_cb == 0:
            return token
        else:
            self.results.append(token)
            return None
    def s_opening_cb(self, scanner, token):
        self.opened_cb += 1
        if self.opened_cb == 1:
            return token
        self.results.append(token)
        return None
    def s_closing_cb(self, scanner, token):
        self.opened_cb -= 1
        if self.opened_cb == 0:
            t = [''.join(self.results), token]
            self.results.clear()
            return t
        else:
            self.results.append(token)
            return None

最后，我创建Scanner并将结果联接在一个普通列表中：

for s in l:
    results = []
    cb = Cb()
    scanner = re.Scanner([
        (r'[^{}]+', cb.s_text_until_cb),
        (r'[{]', cb.s_opening_cb),
        (r'[}]', cb.s_closing_cb),
    ])
    r = scanner.scan(s)[0]
    for elem in r:
        if isinstance(elem, list):
            results.extend(elem)
        else:
            results.append(elem)
    print('Original string --> {0}nResult --> {1}nn'.format(s, results))

这里查看完整的程序和执行结果：

import re
l = ['something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else',
     'something{now I am wrapped} here {and there} listen',
     'something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything',
     'something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything']

class Cb():
    def __init__(self, results=None):
        self.results = []
        self.opened_cb = 0
    def s_text_until_cb(self, scanner, token):
        if self.opened_cb == 0:
            return token
        else:
            self.results.append(token)
            return None
    def s_opening_cb(self, scanner, token):
        self.opened_cb += 1
        if self.opened_cb == 1:
            return token
        return None
    def s_closing_cb(self, scanner, token):
        self.opened_cb -= 1
        if self.opened_cb == 0:
            t = [''.join(self.results), token]
            self.results.clear()
            return t
        else:
            self.results.append(token)
            return None
for s in l:
    results = []
    cb = Cb()
    scanner = re.Scanner([
        (r'[^{}]+', cb.s_text_until_cb),
        (r'[{]', cb.s_opening_cb),
        (r'[}]', cb.s_closing_cb),
    ])
    r = scanner.scan(s)[0]
    for elem in r:
        if isinstance(elem, list):
            results.extend(elem)
        else:
            results.append(elem)
    print('Original string --> {0}nResult --> {1}nn'.format(s, results))

像这样运行它：

python3 script.py

这会产生：

Original string --> something{now I am wrapped {I should not cause splitting} I am still wrapped}everything else
Result --> ['something', '{', 'now I am wrapped {I should not cause splitting} I am still wrapped', '}', 'everything else']

Original string --> something{now I am wrapped} here {and there} listen
Result --> ['something', '{', 'now I am wrapped', '}', ' here ', '{', 'and there', '}', ' listen']

Original string --> something{now I am wrapped {I should {not} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now I am wrapped {I should {not} cause splitting} I am still wrapped', '}', 'everything']

Original string --> something{now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped}everything
Result --> ['something', '{', 'now {I {am}} wrapped {I should {{{not}}} cause splitting} I am still wrapped', '}', 'everything']

相关内容

最新更新

热门标签：