Java Regex识别GitHub中错误的引用



我需要捕获GitHub中提交消息中引用的所有错误号码。

一个错误号是一个整数,参考以 fix /修复/ filex / fix fixing / CLOSS /关闭/关闭/关闭/strong>/已解决/解决,其次是 #xyz xyz是错误号码。

这是我尝试的例子,也是我尝试的:

String commitMessage = "This fixes #23 fixed#24 fix #25, #26 resolves #27 #28#29 resolved#30 #31 ,  #32. Also see #33";
String regex = "clos(e|es|ed|ing) ?#[0-9]+" 
        + "|fix(es|ed|ing)? ?#[0-9]+" 
        + "|resolv(e|es|ed|ing) ?#[0-9]+";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(commitMessage);
while (m.find()){
    System.out.println(m.group(0));
}

,输出为:

fixes #23
fixed #24
fix #25
resolves #27
resolves#30

,但我需要它是:

fixes #23
fixed #24
fix #25, #26
resolves #27 #28#29
resolved#30 #31 ,  #32

请注意,引用可能是对单个错误(例如#23)或多个错误同时(例如,#25,#26)。

还要注意,当引用多个错误时,不同的错误号码之间可能有一个或多个空间和/或逗号。

您可以在#之前将[sp{P}]*添加到正则态度,以匹配whitespaces或标点符号,0或更多事件,您也可以稍微收集一些模式:

String regex = "(?:(?:clos|resolv)(?:e|es|ed|ing)|fix(?:es|ed|ing)?)(?:[\s\p{P}]*#[0-9]+)+";

主要区别是 (?:[\s\p{P}]*#[0-9]+)+匹配:

的1个或更多出现。
  • [\s\p{P}]*-0 whitespace或标点字符
  • #-哈希符号
  • [0-9]+-1或更多数字。

请参阅Java演示:

String commitMessage = "This fixes #23 fixed#24 fix #25, #26 resolves #27 #28#29 resolved#30 #31 ,  #32. Also see #33";
String regex = "(?:(?:clos|resolv)(?:e|es|ed|ing)|fix(?:es|ed|ing)?)(?:[\s\p{P}]*#[0-9]+)+";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(commitMessage);
while (m.find()){
    System.out.println(m.group(0));
}

输出:

fixes #23
fixed#24
fix #25, #26
resolves #27 #28#29
resolved#30 #31 ,  #32

您可以使用后续言论:

clos(e|es|ed|ing)([ ,]*#[0-9]+)+ ?|fix(es|ed|ing)?([ ,]*#[0-9]+)+ ?|resolv(e|es|ed|ing)([ ,]*#[0-9]+)+ ?

这是一个有效的示例:
https://regex101.com/r/in7cox/1

我将使用两组REGEX(和两个while循环)。我还将使用指定的捕获组来使代码更易读,更易于维护:

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class GitHubBugTrackingRegex {
    public static void main(String[] args) {
        String commitMessage = "This fixes #23 fixed#24 fix #25, #26 "
                + "resolves #27 #28#29 resolved#30 #31 ,  #32. Also see #33";
        String regexBugReference    = "(?<oneBug>#\d+)"; 
        String regexBugReferences   = "(?<someBugs>(\s*,*\s*" + regexBugReference + "\s*)+)"; 
        String regex = 
                "(?<oneCase>(?<resolution>clos(e|es|ed|ing)|fix(|es|ed|ing)|resolv(e|es|ed|ing))"   
                        + regexBugReferences
                        + ")";
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(commitMessage);
        while (m.find()){
            String resolution   = m.group("resolution");
            String someBugs     = m.group("someBugs");
            Pattern p2 = Pattern.compile(regexBugReference);
            Matcher m2 = p2.matcher(someBugs);
            StringBuilder sb = new StringBuilder();
            String comma = "";      // first time special
            while (m2.find()) {
                String oneBug = m2.group("oneBug");
                sb.append(comma + oneBug);
                comma = ", ";       // second time and onwards
            }
            System.out.format("%8s %s%n", resolution, sb.toString());
        }
    }
}

此代码的输出是:

   fixes #23
   fixed #24
     fix #25, #26
resolves #27, #28, #29
resolved #30, #31, #32

最新更新