我将如何使用 REGEX 在使用 Ruby 的范围内查找一组独特的单词



我希望创建一个满足以下要求的正则表达式:

1) 必须充当"AND"语句

2)两个词应该在彼此的范围内

3)它不计算两个相同的单词。

到目前为止,我有这个工作正则表达式,它满足 1 和 2。

/(word1|word2)(?:W+w+){0,3}?W+(word1|word2)/i

示例正则表达式:
/(cat|dog)(?:W+w+){0,3}?W+(cat|dog)/i

现在有效的字符串

  • 猫吓坏了另一只猫。

  • 猫喜欢狗。

  • 狗喜欢猫。

  • 讨厌狗。

我不想要的字符串

  • 猫吓坏了另一只猫。

  • 讨厌狗。

像"猫吓坏了另一只猫"这样的短语将与此 REGEX 匹配,因为它正在搜索第二个分组中的任何单词,其中包括 cat。但是,我不希望它自己寻找。我只希望它搜索狗。

怎么样:

/(cat|dog)(?:W+w+){0,3}?W+(?!1)(cat|dog)/

解释:

The regular expression:
(?-imsx:(cat|dog)(?:W+w+){0,3}?W+(?!1)(cat|dog))
matches as follows:
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to 1:
----------------------------------------------------------------------
    cat                      'cat'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    dog                      'dog'
----------------------------------------------------------------------
  )                        end of 1
----------------------------------------------------------------------
  (?:                      group, but do not capture (between 0 and 3
                           times (matching the least amount
                           possible)):
----------------------------------------------------------------------
    W+                      non-word characters (all but a-z, A-Z,
                             0-9, _) (1 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
    w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  ){0,3}?                  end of grouping
----------------------------------------------------------------------
  W+                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (1 or more times (matching the most
                           amount possible))
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    1                       what was matched by capture 1
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (                        group and capture to 2:
----------------------------------------------------------------------
    cat                      'cat'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    dog                      'dog'
----------------------------------------------------------------------
  )                        end of 2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

最新更新