Python拆分Regex没有拆分我需要的东西

我的文件中有这个

import re
sample = """Name: @s
Owner: @a[tag=Admin]"""
target = r"@[sae]([[w{}=, ]*])?"
regex = re.split(target, sample)
print(regex)

我想拆分所有以@开头的单词，如下所示：
["Name: ", "@s", "nOwner: ", "@a[tag=Admin]"]

但它给出的却是：
['Name: ', None, 'nOwner: ', '[tag=Admin]', '']

如何分离？

我会在这里使用re.findall：

sample = """Name: @s
Owner: @a[tag=Admin]"""
parts = re.findall(r'@w+(?:[.*?])?|s*S+s*', sample)
print(parts)  # ['Name: ', '@s', 'nOwner: ', '@a[tag=Admin]']

这里使用的正则表达式模式表示匹配：

@w+          a tag @some_tag
(?:[.*?])?  followed by an optional [...] term
|             OR
s*S+s*     any other non whitespace term,
including optional whitespace on both sides

如果我正确理解需求，您可以按如下方式进行：

import re

s = """Name: @s
Owner: @a[tag=Admin]
"""

rgx = r'(?=@.*)|(?=r?n[^@rn]*)'

re.split(rgx, s)
#=> ['Name: ', '@s', 'nOwner: ', '@a[tag=Admin]n']

演示

正则表达式可以分解如下。

(?=         # begin a positive lookahead
@.*       # match '@' followed by >= 0 chars other than line terminators
)           # end positive lookahead
|           # or
(?=         # begin a positive lookahead
r?n     # match a line terminator
[^@rn]* # match >= 0 characters other than '@' and line terminators 
)           # end positive lookahead

请注意，匹配的宽度为零。

re.split期望正则表达式与字符串中的分隔符匹配。它只返回被捕获的分隔符的部分。在正则表达式的情况下，这只是括号之间的部分(如果存在(。

如果您希望整个分隔符显示在列表中，请在整个正则表达式周围加上括号：

target = r"(@[sae]([[w{}=, ]*])?)"

但你最好不要捕捉内部群体。您可以使用(?:…)而不是(…):将其更改为非捕获组

target = r"(@[sae](?:[[w{}=, ]*])?)"

在输出中，将[tag=Admin]保留为捕获组中的那个部分，使用split也可以返回空字符串。

另一种选择是具体说明允许的数据格式，而不是将零件拆分为两组。

(s*w+:s*)(@[sae](?:[[w{}=, ]*])?)

模式匹配：

(捕获组1

)关闭组

(捕获第2组

@[sae]匹配@，然后是sae
(?:[[w{}=, ]*])?可选匹配[...]

)关闭组

示例代码：

import re
sample = """Name: @s
Owner: @a[tag=Admin]"""
target = r"(s*w+:s*)(@[sae](?:[[w{}=, ]*])?)"
listOfTuples = re.findall(target, sample)
lst = [s for tpl in listOfTuples for s in tpl]
print(lst)

输出

['Name: ', '@s', 'nOwner: ', '@a[tag=Admin]']

请参阅regex演示和Python演示。

相关内容

最新更新

热门标签：