返回按嵌套圆括号拆分的字符串的正则表达式

我知道关于如何分割带括号的字符串的问题有很多答案，但它们从来没有递归地这样做过。查看字符串1 2 3 (test 0, test 0) (test (0 test) 0):
Regexs(?![^(]*))返回"1", "2", "3", "(test 0, test 0)", "(test", "(0 test) 0)"
我要查找的正则表达式将返回
"1", "2", "3", "(test 0, test 0)", "(test (0 test)0)"
或
"1", "2", "3", "test 0, test 0", "test (0 test)0"
，这将允许我再次在结果中递归使用它，直到没有括号为止
理想情况下，它也会尊重转义括号，但我自己在regex方面并不那么先进，只知道基本知识
有人知道如何处理这个问题吗？

仅对任务使用regex可能有效，但并不简单。

另一种可能性是编写一个简单的算法来跟踪字符串中的括号：

在所有括号处拆分字符串，同时返回分隔符(例如使用re.split(
保持计数器跟踪括号：start_parens_count用于(，end_parens_count用于)
使用计数器，在空白处进行拆分或将当前数据添加到临时变量(term(
当最左边的括号已经闭合时，将term附加到值列表&重置计数器/温度变量

这里有一个例子：

import re
string = "1 2 3 (test 0, test 0) (test (0 test) 0)"

result, start_parens_count, end_parens_count, term = [], 0, 0, ""
for x in re.split(r"([()])", string):
if not x.strip():
continue
elif x == "(":
if start_parens_count > 0:
term += "("
start_parens_count += 1
elif x == ")":
end_parens_count += 1
if end_parens_count == start_parens_count:
result.append(term)
end_parens_count, start_parens_count, term = 0, 0, ""
else:
term += ")"
elif start_parens_count > end_parens_count:
term += x
else:
result.extend(x.strip(" ").split(" "))

print(result)
# ['1', '2', '3', 'test 0, test 0', 'test (0 test) 0']

不是很优雅，但很管用。

您可以使用pip install regex和

import regex
text = "1 2 3 (test 0, test 0) (test (0 test) 0)"
matches = [match.group() for match in regex.finditer(r"(?:(((?>[^()]+|(?1))*))|S)+", text)]
print(matches)
# => ['1', '2', '3', '(test 0, test 0)', '(test (0 test) 0)']

请参阅在线Python演示。请参阅regex演示。正则表达式匹配：

(?:-非捕获组的开始：
- (((?>[^()]+|(?1))*))-任何嵌套圆括号之间的文本
|-或
- S-任何非空白字符
)+-组结束，重复一次或多次

或者，您也可以使用pyparsing。

import pyparsing as pp
pattern = pp.ZeroOrMore(pp.Regex(r'S+') ^ pp.original_text_for(pp.nested_expr('(', ')')))
# Tests
string = '1 2 3 (test 0, test 0) (test (0 test) 0)'
result = pattern.parse_string(string).as_list()
answer = ['1', '2', '3', '(test 0, test 0)', '(test (0 test) 0)']
assert result == answer
string = ''
result = pattern.parse_string(string).as_list()
answer = []
assert result == answer
string = 'a'
result = pattern.parse_string(string).as_list()
answer = ['a']
assert result == answer
string = ' a (1) ! '
result = pattern.parse_string(string).as_list()
answer = ['a', '(1)', '!']
assert result == answer
string = ' a (b) cd (e f) g hi (j (k l) m) (o p (qr (s t) u v) w (x y) z)'
result = pattern.parse_string(string).as_list()
answer = ['a', '(b)', 'cd', '(e f)', 'g', 'hi', '(j (k l) m)', '(o p (qr (s t) u v) w (x y) z)']
assert result == answer

*pip install pyparsing可以安装pyparsing

此外，您可以一次直接解析所有嵌套的圆括号：

pattern = pp.ZeroOrMore(pp.Regex(r'S+') ^ pp.nested_expr('(', ')'))
string = '1 2 3 (test 0, test 0) (test (0 test) 0)'
result = pattern.parse_string(string).as_list()
answer = ['1', '2', '3', ['test', '0,', 'test', '0'], ['test', ['0', 'test'], '0']]
assert result == answer

*在这种情况下，空格是一个分隔符。

注意：

如果()内部的一对圆括号断开(例如a(b(c)、a(b)c)等(，则会获得意外结果或引发IndexError。所以要小心使用。(请参阅：Python提取短语中的字符串(

注意：

相关内容

最新更新

热门标签：