是否可以制作一个修改后的图案，以便在应用拆分时，分隔符将与基本图案不匹配

在最近使用String.split()时，我遇到了一种情况，即文本非常动态，提取匹配项比过滤掉不匹配项更容易。

我发现自己在想，是否可以修改String.split()的"反向正则表达式"，这样你就可以给它任何模式，它将匹配与该模式不匹配的每一组字符。

*注意：这里的"问题"可以用String.matches()、Tokens、Matcher.group()等轻松解决。这个问题主要是假设性的（代码示例仍然很受欢迎，因为问题的本质非常需要它），而且不是关于如何实现结果，而是关于（如果可以通过这种方式实现）。

我尝试过的：

String pattern1 = "(test)"; //A verif. that what "should-not-match" is working correctly.
        String pattern2 = "[^(test)]"; //FAIL - unmatches the letters separately.
        String pattern3 = "(^(test))"; //FAIL - does not match anything, it seems.
        String text = ""
                        + "This is a test. "
                        + "This test should (?not?) match the word "test", whenever it appears.n"
                        + "This is about to test if a "String.split()" can be used in a different way.n"
                        + "By the way, "testing" does not equal "test","
                        + "but it will split in the middle because it contains "test".";
        for (String s : text.split(pattern3)) {
            System.out.println(s);
        }

还有其他类似的模式，没有一个接近成功。

更新：

我现在也尝试了一些使用特殊构造函数的模式，但也没有成功。

至于我想要的，按照"测试"的例子，得到一个包含字符串的数组，其内容是"文本"（我想用什么作为基本模式，或者换句话说，我想查找什么）。

但使用String.split()来实现这一点，使使用基本模式直接导致"无论什么不是（测试）"，因此需要反转才能导致"只出现（测试）的情况"。

圣经大小的长话短说，想要的是导致这种行为的String.split()的正则表达式（+结果）：注意：遵循上面的示例代码，包括所需的变量（文本）。

String[] trash = text.split("test"); //<-base pattern, needs reversing.
        System.out.println("nnWhat should match the split-pattern (due reversal), become separators, and be filtered out:");
        for (String s : trash) {
            System.out.println("[" + s + "]");
            text = text.replace(s, "%!%"); //<-simulated wanted behavior.
        }
        System.out.println("nnWhat should be the resulting String[]:");
        for (String s : text.split("%!%")) {
            System.out.println(s);
        }
        System.out.println("Note: There is a blank @ index [0], since if the text does not start with "test", there is a sep. between. This is NOT WRONG.");

欢迎使用代码示例。创建这样的代码的可能性（或不创建）毕竟是这个问题的本质。

您可能在谈论（？！构造.

它被记录在Pattern类的javadoc中。他们称之为消极的前瞻性断言。

解决问题最直接的方法是反复寻找。

    Pattern p = Pattern.compile(regexForThingIWant);
    Matcher m = p.matcher(str);
    int cursor = 0;
    while (m.find(cursor)) {
      String x = m.group();
      // do something with x
      cursor = m.end();
    }

我能够拼凑出一个用于拆分的正则表达式，它似乎可以做你想做的事，但很糟糕：

(^|(?<=test))((?!test).)*

我很难看到你想要看到的split的输出，因为你唯一的提示是测试字符串的一部分，然后是间接的（就像你希望单词testing分为两部分）。

好吧，让我们尝试一个积极的背后：

^|(?<=test)

这将返回

This is a test
. This test
 should (?not?) match the word "test
", whenever it appears.
This is about to test
 if a "String.split()" can be used in a different way.
By the way, "test
ing" does not equal "test
",but it will split in the middle because it contains "test
".

这就是你想要的吗？

请注意，当以输入的"匹配"one_answers"不匹配"位（在松散意义上）都不会被分割过程消耗的方式分割文本时，您需要对正则表达式进行设计，使其只匹配（一些）空字符串，在"匹配"一词的技术意义上。

因此，lookahead和lookbehinds几乎是使用正则表达式解决此类任务的唯一工具。

然而，如果你喜欢消耗所有非测试部件，这也是可以实现的。

(?<=^|(test))(tes[^t]|te[^s]|t[^e]|[^t])*

它是相同的lookbacking，然后消费任何看起来不像单词test的东西。

不过，这种方法并不完全通用。这个问题解释了它的局限性。

相关内容

最新更新

热门标签：