我通常对Regex很满意,但我在这方面很吃力。我需要一个与术语cbd
匹配的正则表达式,但如果短语central business district
出现在搜索字符串中的任何位置,则不需要。或者,如果这太难了,如果短语central business district
没有出现在术语cbd
之前的,则至少与cbd
匹配。结果应该只返回cbd
部分,所以我使用了lookaheads/lookbehinds,但我无法满足要求。。。
输入示例:
GOODAny products containing CBD are to be regulated.
BADProperties located within the Central Business District (CBD) are to be regulated
我试过:
(?!central business district)cbd
(.*(?!central business district).*)cbd
这是在Python 3.6+中使用re
模块。
我知道用几行代码很容易完成,但我们在数据库中有一个正则表达式字符串列表,用于在语料库中搜索包含数据库中任何一个正则字符串的文档。最好避免在脚本中硬编码任何关键字,因为这样我们的其他开发人员就不清楚这些匹配来自哪里,因为他们在数据库中看不到。
将PyPi正则表达式与一起使用
import regex
strings = [' I need a regular expression that matches the term cbd but not if the phrase central business district appears anywhere else in the search string.', 'I need cbd here.']
for s in strings:
x = regex.search(r'(?<!central business district.*)cbd(?!.*central business district)', s, regex.S)
if x:
print(s, x.group(), sep=" => ")
结果:I need cbd here. => cbd
。请参见Python代码。
解释
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
central business 'central business district'
district
--------------------------------------------------------------------------------
.* any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
cbd 'cbd'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
central business 'central business district'
district
--------------------------------------------------------------------------------
) end of look-ahead