正则表达式第 N 个匹配项或最后一个匹配项(如果 < 个匹配项)



我正在尝试查找第n个匹配项,或者如果匹配项少于n,则查找最后一个匹配项。n是在我的程序中确定的,regex字符串是用整数替换"n"来构造的。

这是我的最佳猜测,但我的重复运算符{1,n}总是只匹配一次。我以为默认情况下会很贪婪

The basic regex would be:
distinctiveString[sS]*?value="([^"]*)"
So I modified it to this to try to get the nth one instead
(?:distinctiveString[sS]*?){1,n}value="([^"]*)"
distinctiveString randomStuff value="val1"
moreRandomStuff
distinctiveString randomStuff value="val2"
moreRandomStuff
distinctiveString randomStuff value="val3"
moreRandomStuff
distinctiveString randomStuff value="val4"
moreRandomStuff
distinctiveString randomStuff value="val5"

所以在这种情况下,我想要的是,当n=2时,我会得到"val2",n=5时,我得到"val5",n=8时,我也会得到"val 5"。

我正在通过应用程序层传递我的正则表达式,但我认为它是直接交给Perl的。

试试这样的东西:

(?:(?:[sS]*?distinctiveString){4}[sS]*?|(?:[sS]*distinctiveString)[sS]*?)value="([^"]*)"

其将具有匹配组1中的CCD_ 1或用于输入的"val3"

distinctiveString randomStuff value="val1"
moreRandomStuff
distinctiveString randomStuff value="val2"
moreRandomStuff
distinctiveString randomStuff value="val3"

模式的快速分解:

(?:                                         #
  (?:[sS]*?distinctiveString){4}[sS]*?  # match 4 'distinctiveString's
  |                                         # OR
  (?:[sS]*distinctiveString)[sS]*?      # match the last 'distinctiveString'
)                                           #
value="([^"]*)"                             #

通过查看您的个人资料,您似乎在Java标签中最活跃,因此这里有一个小的Java演示:

import java.util.regex.*;
public class Main {
    private static String getNthMatch(int n, String text, String distinctive) {
        String regex = String.format(
                "(?xs)                 # enable comments and dot-all           n" +
                "(?:                   # start non-capturing group 1           n" +
                "  (?:.*?%s){%d}       #   match n 'distinctive' strings       n" +
                "  |                   #   OR                                  n" +
                "  (?:.*%s)            #   match the last 'distinctive' string n" +
                ")                     # end non-capturing group 1             n" +
                ".*?value="([^"]*)" # match the value                       n",
                distinctive, n, distinctive
        );
        Matcher m = Pattern.compile(regex).matcher(text);
        return m.find() ? m.group(1) : null;
    }
    public static void main(String[] args) throws Exception {
        String text = "distinctiveString randomStuff value="val1" n" +
                "moreRandomStuff                                    n" +
                "distinctiveString randomStuff value="val2"       n" +
                "moreRandomStuff                                    n" +
                "distinctiveString randomStuff value="val3"       n" +
                "moreRandomStuff                                    n" +
                "distinctiveString randomStuff value="val4"       n" +
                "moreRandomStuff                                    n" +
                "distinctiveString randomStuff value="val5"         ";
        String distinctive = "distinctiveString";
        System.out.println(getNthMatch(4, text, distinctive));
        System.out.println(getNthMatch(5, text, distinctive));
        System.out.println(getNthMatch(6, text, distinctive));
        System.out.println(getNthMatch(7, text, distinctive));
    }
}

它将在控制台上打印以下内容:

val4val5val5val5

注意,当启用点全部选项((?s))时,.[sS]匹配。

编辑

是的,{1,n}是贪婪的。然而,当您在(?:distinctiveString[sS]*?){1,3}中的distinctiveString之后放置[sS]*?时,则"val4"0匹配,然后不情愿地零个或多个字符(因此零将匹配),然后重复1到3次。您要做的是在distinctiveString:之前移动[sS]*?

import java.util.regex.*;
public class Main {
        private static String getNthMatch(int n, String text, String distinctive) {
            String regex = String.format(
                    "(?:[\s\S]*?%s){1,%d}[\s\S]*?value="([^"]*)"",
                    distinctive, n
            );
            Matcher m = Pattern.compile(regex).matcher(text);
            return m.find() ? m.group(1) : null;
        }
    public static void main(String[] args) throws Exception {
        String text = "distinctiveString randomStuff value="val1" n" +
                "moreRandomStuff                                    n" +
                "distinctiveString randomStuff value="val2"       n" +
                "moreRandomStuff                                    n" +
                "distinctiveString randomStuff value="val3"       n" +
                "moreRandomStuff                                    n" +
                "distinctiveString randomStuff value="val4"       n" +
                "moreRandomStuff                                    n" +
                "distinctiveString randomStuff value="val5"         ";
        String distinctive = "distinctiveString";
        System.out.println(getNthMatch(4, text, distinctive));
        System.out.println(getNthMatch(5, text, distinctive));
        System.out.println(getNthMatch(6, text, distinctive));
        System.out.println(getNthMatch(7, text, distinctive));
    }
}

它还打印:

val4val5val5val5

相关内容

最新更新