使用 String.replaceFirst(正则表达式,"$1")获取匹配的子字符串时得到空字符串,正则表达式有什么问题?



我想将ANSI转义序列转换为IRC颜色序列。

所以我写了一个正则表达式 1 e[([d;]+)?m 但是,shell_output_string.replaceFirst ("\e\[([\d;]+)?m", "$1")将同时返回匹配的子字符串和其余不匹配的子字符串。

然后我写了正则表达式 2 .*e[([d;]+)?m.*,希望它能匹配整个字符串并用匹配的子字符串替换它,但是,replaceFirst (".*\e\[([\d;]+)?m.*", "$1")返回空字符串,但matches (".*\e\[([\d;]+)?m.*") true。这个正则表达式有什么问题?

以下问题与这个问题非常相似:在 Java 中获取子字符串的模式/匹配器 group() ?

示例代码

import java.util.regex.*;
public class AnsiEscapeToIrcEscape
{
    public static void main (String[] args)
    {
//# grep --color=always bot /etc/passwd
//
//bot:x:1000:1000:bot:/home/bot:/bin/bash
byte[] shell_output_array = {
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#1 - #11)
0x62, 0x6F, 0x74,   // bot  (#12 - #14)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#15 - #20)
0x3A, 0x78, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A,   // :x:1000:1000:    (#21 - #33)
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#34 - #44)
0x62, 0x6F, 0x74,   // bot  (#45 - #47)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#48 - #53)
0x3A, 0x2F, 0x68, 0x6F, 0x6D, 0x65, 0x2F,   // :/home/  (#54 - #60)
0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#61 - #71)
0x62, 0x6F, 0x74,   // bot  (#72 - #74)
0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#75 - #80)
0x3A, 0x2F, 0x62, 0x69, 0x6E, 0x2F, 0x62, 0x61, 0x73, 0x68, // :/bin/bash   (#81 - #90)
};
        String shell_output = new String (shell_output_array);
        System.out.println (shell_output);
        System.out.println ("total " + shell_output_array.length + " bytes");
        final String CSI_REGEXP = "\e\[";
        final String CSI_SGR_REGEXP_First = CSI_REGEXP + "([\d;]+)?m";
        final String CSI_SGR_REGEXP = ".*" + CSI_SGR_REGEXP_First + ".*";
        System.out.println (shell_output.replaceFirst(CSI_SGR_REGEXP_First, "$1"));
        System.out.println (shell_output.replaceFirst(CSI_SGR_REGEXP, "$1"));
    }
}

正则表达式是贪婪的 - 也就是说,每个模式都会尝试匹配尽可能多的输入。

这意味着当模式以 .* 开头时,该部分模式将尝试覆盖尽可能多的输入文本 - 因此有效地迫使模式的其余部分尝试从输入字符串的末尾开始查找匹配项。

那么,从字符串末尾开始的其余模式的第一个匹配项是什么(或者,如果您愿意,匹配的最后一个子字符串是什么)? 它位于输入的倒数第二行,仅包含 ^[m

这匹配,因为整个 ([\d;]+) 部分模式由以下 ?

反过来,这意味着,由于最终表达式没有数字或 ;,$1 组为空 - 因此您可以获得空字符串输出。

至少,我认为这是没有靠近Java机器来测试它的。 希望对您有所帮助。

    The API of String's replaceFirst says :

     replaceFirst
    public String replaceFirst(String regex,
                               String replacement)
        Replaces the first substring of this string that matches the given regular expression with the given replacement.
        An invocation of this method of the form str.replaceFirst(regex, repl) yields exactly the same result as the expression
            Pattern.compile(regex).matcher(str).replaceFirst(repl)
        Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceFirst(java.lang.String). Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.
        Parameters:
            regex - the regular expression to which this string is to be matched
            replacement - the string to be substituted for the first match 
        Returns:
            The resulting String 
        Throws:
            PatternSyntaxException - if the regular expression's syntax is invalid
        Since:
            1.4
        See Also:
            Pattern

Please read the Note Part which specifies that the  and $ may cause the result to be different.
You can use Pattern and Matcher instead.
Example  
public class RegexMatches
{
    public static void main( String args[] ){
      // String to be scanned to find the pattern.
     // String line = "This order was placed for QT3000! OK?";
     // String pattern = "(.*)(\d+)(.*)";
      byte[] shell_output_array = {
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#1 - #11)
              0x62, 0x6F, 0x74,   // bot  (#12 - #14)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#15 - #20)
              0x3A, 0x78, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A, 0x31, 0x30, 0x30, 0x30, 0x3A,   // :x:1000:1000:    (#21 - #33)
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#34 - #44)
              0x62, 0x6F, 0x74,   // bot  (#45 - #47)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#48 - #53)
              0x3A, 0x2F, 0x68, 0x6F, 0x6D, 0x65, 0x2F,   // :/home/  (#54 - #60)
              0x1B, 0x5B, 0x30, 0x31, 0x3B, 0x33, 0x31, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[01;31m^[[K  (#61 - #71)
              0x62, 0x6F, 0x74,   // bot  (#72 - #74)
              0x1B, 0x5B, 0x6D, 0x1B, 0x5B, 0x4B, // ^[[m^[[K (#75 - #80)
              0x3A, 0x2F, 0x62, 0x69, 0x6E, 0x2F, 0x62, 0x61, 0x73, 0x68, // :/bin/bash   (#81 - #90)
              };
      String line = new String (shell_output_array);
      //String pattern = "(.*)(\d+)(.*)";
      final String CSI_REGEXP = "\e\[";
      final String CSI_SGR_REGEXP_First = CSI_REGEXP + "([\d;]+)?m";
      final String CSI_SGR_REGEXP = ".*" + CSI_SGR_REGEXP_First + ".*";
      // Create a Pattern object
      Pattern r = Pattern.compile(CSI_SGR_REGEXP);
      // Now create matcher object.
      Matcher m = r.matcher(line);
      while (m.find()) {
         System.out.println(m.start() + "  " + m.end());
         System.out.println("Found value: " + m.group());
      } 
   }
}

最新更新