Regex-在两个指定单词之间查找特定长度的单词



你好,

我最近开始使用regex(在java中(,偶然发现了一个问题,我需要一些帮助/指导。

我想在单词jack和james之间找到一定长度的单词(在本例中为4个字符或更长(。

以下是我用来测试regex的文本。

james was playing with jack yesterday (line 1)
jack was playing with james yersterday (line 2)
jack and james are best friends (line 3)
james will be helping jack with his homework (line 4)
yesterday, james come over jack's house (line 5)

我希望实现以下

playing with(line 1)
playing with(line 2)
no matches(line 3)
will helping(line 4)
come over(line 5)

我想出了以下

(?<=james)(.*)(?=jack)|(?<=jack)(.*)(?=james)

但是这个特殊的正则表达式返回两个单词之间的所有字符。我还尝试了以下几项,但都没有成功(在挫折感开始占据主导地位之前,我也尝试了许多其他方法(。此外,我省略了

(?<=james)(\bw{4,}\b)(?=jack)|(?<=jack)(\bw{4,}\b)(?=james)

如有任何指导,我们将不胜感激。

真诚的

这似乎可以根据需要工作。

  • (?<=)正后方看两个名字
  • (?=)对这两个名字的前瞻性假设
  • \w{4,}三个字符以上的单词
  • .*用来吞噬两个零宽度断言之间的字符
String[] lines =  {"james was playing with jack yesterday (line 1)",
"jack was playing with james yersterday (line 2)",
"jack and james are best friends (line 3)",
"james will be helping jack with his homework (line 4)",
"yesterday, james come over jack's house (line 5)"};
Pattern p = Pattern.compile("(?<=(?:jack|james).*)(\w{4,})(?=.*(?:jack|james))");
for (String line : lines) {
Matcher m = p.matcher(line);
// a flag for printing a new line.     
boolean flag = false;
while(m.find()) {
flag = true;
System.out.print(m.group(1) + " " );
}
if (flag) {
System.out.println();
}
}

打印

playing with 
playing with 
will helping 
come over 

使用

(?:G(?<!^)|(jack|james))(?:W+w{1,3})*W+(w{4,})(?=(?:(?!1).)*?(?!1)(jack|james))

见证明。您需要保存在第2组中的值。

解释

EXPLANATION
--------------------------------------------------------------------------------
(?:                      group, but do not capture:
--------------------------------------------------------------------------------
G                       where the last m//g left off
--------------------------------------------------------------------------------
(?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
^                        the beginning of the string
--------------------------------------------------------------------------------
)                        end of look-behind
--------------------------------------------------------------------------------
|                        OR
--------------------------------------------------------------------------------
(                        group and capture to 1:
--------------------------------------------------------------------------------
jack                     'jack'
--------------------------------------------------------------------------------
|                        OR
--------------------------------------------------------------------------------
james                    'james'
--------------------------------------------------------------------------------
)                        end of 1
--------------------------------------------------------------------------------
)                        end of grouping
--------------------------------------------------------------------------------
(?:                      group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
W+                      non-word characters (all but a-z, A-Z, 0-
9, _) (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
w{1,3}                  word characters (a-z, A-Z, 0-9, _)
(between 1 and 3 times (matching the
most amount possible))
--------------------------------------------------------------------------------
)*                       end of grouping
--------------------------------------------------------------------------------
W+                      non-word characters (all but a-z, A-Z, 0-
9, _) (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
(                        group and capture to 2:
--------------------------------------------------------------------------------
w{4,}                   word characters (a-z, A-Z, 0-9, _) (at
least 4 times (matching the most amount
possible))
--------------------------------------------------------------------------------
)                        end of 2
--------------------------------------------------------------------------------
(?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
(?:                      group, but do not capture (0 or more
times (matching the least amount
possible)):
--------------------------------------------------------------------------------
(?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
1                       what was matched by capture 1
--------------------------------------------------------------------------------
)                        end of look-ahead
--------------------------------------------------------------------------------
.                        any character except n
--------------------------------------------------------------------------------
)*?                      end of grouping
--------------------------------------------------------------------------------
(?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
1                       what was matched by capture 1
--------------------------------------------------------------------------------
)                        end of look-ahead
--------------------------------------------------------------------------------
(                        group and capture to 3:
--------------------------------------------------------------------------------
jack                     'jack'
--------------------------------------------------------------------------------
|                        OR
--------------------------------------------------------------------------------
james                    'james'
--------------------------------------------------------------------------------
)                        end of 3
--------------------------------------------------------------------------------
)                        end of look-ahead

Java:

import java.util.*;
import java.util.regex.*;
import java.lang.*;
import java.io.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String regex = "(?:\G(?<!^)|(jack|james))(?:\W+\w{1,3})*\W+(\w{4,})(?=(?:(?!\1).)*?(?!\1)(jack|james))";
String string = "james was playing with jack yesterday";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
List<String> results = new ArrayList<>();
while (matcher.find()) {
results.add(matcher.group(2));
}
System.out.println(String.join(" ", results));
}
}

结果:playing with

最新更新