如何编写正则表达式，使整个正则表达式是一个包含两个可能组的集合?

我试图在文本中找到罗马数字的实例，后跟一个停止和一个空格，如IV.。这些表明诗句的开头。然而，有些经文不是以罗马数字开头的，所以我在这些经文的开头插入了[NV]标签。我有一个可以找到数字的正则表达式和一个可以找到[NV]标签的正则表达式，但我不能将它们组合在一个正则表达式中来查找其中一个。

我查找数字的正则表达式是：

numeralpat = re.compile(r'[IVX]{1,4}. ')

我想我可以把它和其他正则表达式放在一个集合中，以找到一个数字或一个[NV]标签：

numeralpat = re.compile(r'[([NV])([IVX]{1,4}. )]')

这会导致相同类型的括号之间出现问题，因此我尝试转义不同的字符以使其工作。这些都对我不起作用。这可以使用正则表达式来完成吗？

编辑以添加示例文本：

发短信：

I. this is some text with a verse numeral
II. this is some text with a verse numeral
III. this is some text with a verse numeral
[NV]this is text with no verse numeral
IV. this is some text with a verse numeral
V. this is some text with a verse numeral

预期比赛：

'I. '
'II. '
'III. '
'[NV]'
'IV. '
'V. '

您可以指定备用查找，如下所示：r'(abc|def)'- 查找'abc'或'def'- 您还应该转义括号以查找不是'N'或'V'的显式[NV]：

import re
regex = r"([NV]|[IVX]{1,4}.)"
test_str = ("I. Some textn"
"some Textn"
"II. some textn"
"[NV] more textn")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum= matchNum,
start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum= groupNum,
start = match.start(groupNum),
end = match.end(groupNum), 
group = match.group(groupNum)))

输出：

Match 1 was found at 0-2: I.
Group 1 found at 0-2: I.
Match 2 was found at 23-26: II.
Group 1 found at 23-26: II.
Match 3 was found at 37-41: [NV]
Group 1 found at 37-41: [NV]

见 https://regex101.com/r/MpMxcP/1

它最多查找 4 次'[NV]'或'[IVX]'中的任何一个，然后是文字'.'

您可以使用这样的交替来组合两个正则表达式，

(?:[NV]|[IVX]{1,4}. )

这将匹配[NV]或IVX字符中的任何一个 1 到 4 次，后跟一个.和一个空格。

演示

相关内容

最新更新

热门标签：