如何使用正则表达式在 python 中拆分括号列表？

我正在尝试使用 python 中的 re 模块来拆分表示列表的字符串。该列表用括号标识。

输入：

"[1]first[2]second[3]third" ... etc

期望输出：

['first', 'second', 'third',...]

我当前的代码如下：

out = re.split('[(.*?)]', thelist)

它返回以下内容，但我如何获得所需的内容？

['', '1', 'first', '2', "second", '3', 'third',...]

您可以使用

正则表达式来匹配用[...]括起来的数字，并使用以下命令删除空元素：

import re
p = re.compile(r'[d+]')
test_str = "[1]first[2]second[3]third"
print([x for x in p.split(test_str) if x])
# => ['first', 'second', 'third']

查看 IDEONE 演示

要获取带有 Python 3 中数字的输出，您可以使用

import re
test_str = "[1]first[2]second[3]third"
print( re.split(r'(?!^)(?=[d+])', test_str) )

请参阅此 Python 3 演示。

您的代码返回捕获的文本，因为re.split将所有捕获作为结果数组中的单独元素返回。

如果分隔符中有捕获组，并且它在字符串的开头匹配，则结果将以空字符串开头。

另外，要只删除第一个空元素，您可以使用

res = p.split(test_str)
if not res[0]:
    del res[0]

use out[2：：2]。这将从第三个条目到最后一个条目，但只占用每个第二个条目。

如果格式始终相同，并且单词中没有括号，请使用findall并获取每个右括号后的字符串：

s = "[1]first[2]second[3]third"
import re
print(re.findall("](w+)" ,s))
['first', 'second', 'third']

要处理空格等，您可以使用字符集：

s = "[1]first foo[2]second[3]third"
import re
print(re.findall("]([ws]+)", s))
['first foo', 'second', 'third']

如果您的字符串看起来像您描述的那样，则可以使用简单的正则表达式：

re.findall(r'[a-z]+', s)

findall将为您返回一个列表，因此无需split

和输出：

['first', 'second', 'third']

I used a lookahead to find a match with |$ to find the last sentence
print(".+? is the ungreedy character match")
print("(?=[d{2}]) is the lookforward character match")
pattern="[d{2}].+?(?=[d{2}]|$)"
matches=re.findall(pattern,txt)
for match in matches:
    print("output",match)
output:
output [01] Final Step - Protonica 
output [02] Liquid Frequencies (Liquid Soul Mix) - Liquid Soul 
output [03] Global Illumination - Liquid Soul 
output [04] Devotion - Liquid Soul 
output [05] Black Rock City - Quantize 
output [06] Plazza Del Trripy - Andromeda 
output [07] Private Guide - Suntree 
output [08] Stereo Gun - Vibrasphere 
output [09] The Cycle - Ritree 
output [10] Atmonizer - Andromed

相关内容

最新更新

热门标签：