正则表达式与一个单词匹配,但仅当另一个单词没有出现时?



我通常对Regex很满意,但我在这方面很吃力。我需要一个与术语cbd匹配的正则表达式,但如果短语central business district出现在搜索字符串中的任何位置,则不需要。或者,如果这太难了,如果短语central business district没有出现在术语cbd之前的,则至少与cbd匹配。结果应该只返回cbd部分,所以我使用了lookaheads/lookbehinds,但我无法满足要求。。。

输入示例:
GOODAny products containing CBD are to be regulated.
BADProperties located within the Central Business District (CBD) are to be regulated

我试过:

  • (?!central business district)cbd
  • (.*(?!central business district).*)cbd

这是在Python 3.6+中使用re模块。

我知道用几行代码很容易完成,但我们在数据库中有一个正则表达式字符串列表,用于在语料库中搜索包含数据库中任何一个正则字符串的文档。最好避免在脚本中硬编码任何关键字,因为这样我们的其他开发人员就不清楚这些匹配来自哪里,因为他们在数据库中看不到。

将PyPi正则表达式与一起使用

import regex
strings = [' I need a regular expression that matches the term cbd but not if the phrase central business district appears anywhere else in the search string.', 'I need cbd here.']
for s in strings:
x = regex.search(r'(?<!central business district.*)cbd(?!.*central business district)', s, regex.S)
if x:
print(s, x.group(), sep=" => ")

结果:I need cbd here. => cbd。请参见Python代码。

解释

--------------------------------------------------------------------------------
(?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
central business         'central business district'
district
--------------------------------------------------------------------------------
.*                       any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)                        end of look-behind
--------------------------------------------------------------------------------
cbd                      'cbd'
--------------------------------------------------------------------------------
(?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
.*                       any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
central business         'central business district'
district
--------------------------------------------------------------------------------
)                        end of look-ahead

相关内容

最新更新