Regexp:捕获任何在数字之后(包括大写字母)和数字之前的内容

测试字符串：

TEST Hello, world, 75793250
TEST TESTER Hello, world. Another word here. 75793250

所需匹配：

Hello, world, 
Hello, world. Another word here.

我想选择大写字母和8位数字之间的所有内容。

我该怎么做？

编辑：目的是使用Notepad++清理大型文本文件。我正在使用Notepad++和Rubular.com进行测试。

试试这样的东西：

/(?<=[A-Z]+(?: [A-Z]+)*b)(?:(?!bd{8}).)*/

基本上：

在后面查找所有的大写字母或空格，后面跟一个单词break
然后开始匹配，从那一点开始，匹配，直到你遇到一个后面跟着8位数字的单词中断

如果您的正则表达式引擎（像我的一样）抱怨可变长度外观滞后，请尝试以下操作：

/(?:[A-Z]+(?: [A-Z]+)*b)((?:(?!bd{8}).)*)/

收益率：

>> "TEST Hello, world, 75793250".match /(?:[A-Z]+(?: [A-Z]+)*b)((?:(?!bd{8}).)*)/
=> #<MatchData "TEST Hello, world, " 1:" Hello, world, ">
>> "TEST TESTER Hello, world. Another word here. 75793250".match /(?:[A-Z]+(?: [A-Z]+)*b)((?:(?!bd{8}).)*)/
=> #<MatchData "TEST TESTER Hello, world. Another word here. " 1:" Hello, world. Another word here. ">

尝试以下

b[A-Z]+bs+(.*)d{8}

修改后排除了开头的大写单词。所查找的文本在捕获组1:中

(?:b[A-Z]+bs+)+(.*)d{8}

如果大写单词（标记）只在行的开头：

^(?:b[A-Z]+bs+)+(.*)d{8}

您可以使用以下java代码：

    String str = "TEST TESTER Hello, world. Another word here. 75793250";
    Pattern pattern = Pattern.compile("(([A-Z]+\s)+)([^n]*)([0-9]{8})");
    Matcher m = pattern.matcher(str);
    while (m.find()){
        System.out.println(m.group(3));
    }

使用字符类创建一个只匹配大写字母-[A-Z]的原子。那么你想匹配多次（至少一次？），所以[A-Z]+。

然后，您想捕获任何可能的东西——.+，但您想捕获它，所以将它封装在一个命名的捕获——(?<nameHere>.+)中。

然后，您需要匹配数字以使用数字结束捕获，这样这些数字就不会出现在捕获中（因为.+匹配任何内容）。d是数字字符类快捷方式，我们需要一个或多个数字，所以d+。

把所有东西放在一起，在所有东西之间寻找空白（s）：

[A-Z]+s+(?<nameHere>.+)s+d+

使用Match类-Match.Captures拉出命名捕获。

相关内容

最新更新

热门标签：