以下表达式:
^(#ifdef FEATURE)+?s*$((rn.*?)*^(#endif)+s*[//]*s*(end of)*s*FEATURE)+?$
在运行我的编译 .贾尔文件。
匹配的字符串可以类似于:
这是一条垃圾线
#ifdef 功能
#endif//功能结束这是一条垃圾线
#ifdef 功能
这是一个应该匹配的垃圾行:HOLasduiqwhei &//FEATURE fjfefj#endif//h
#endif 功能
这是一条垃圾线
因此,粗体字符串应匹配。错误如下:
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
欢迎任何回溯避免策略/改进表达式。我已经尝试过原子组(?>)
但由于某种原因没有简化。
代码如下:
公共字符串条(字符串文本( {
ArrayList<String> patterns=new ArrayList<String>();
patterns=readFile("Disabled_Features.txt");
for(int i = 0; i < patterns.size(); ++i)
{
Pattern todoPattern = Pattern.compile("^#ifdef "+patterns.get(i)+"((?:\r?\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\r?\n#endif (?:// end of )?"+patterns.get(i)+"$",Pattern.MULTILINE);
Matcher m = todoPattern.matcher(text);
text = m.replaceAll("");
}
return text;
}
我已经尝试了@Wiktor编写的代码并且运行良好
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegex {
public static void main(String[] args) {
String text = "this is a junk linen" +
"n" +
"#ifdef FEATURE n" +
"#endif // end of FEATUREn" +
"n" +
"this is a junk linen" +
"n" +
"#ifdef FEATUREn" +
"n" +
"this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // hn" +
"n" +
"#endif FEATUREn" +
"n" +
"this is a junk line";
// this version does not use Pattern.MULTILINE, this should reduce the backtraking
Matcher matcher2 = Pattern.compile("\n#ifdef FEATURE((?:\r?\n(?!#endif (?:// end of )?FEATURE).*)*)\r?\n#endif (?:// end of )?FEATURE").matcher(text);
while (matcher2.find()) {
System.out.println(matcher2.group());
}
}
}
这让我认为您的问题是由于输入文件的大小造成的。
因此,如果您的文件太大,您可以将输入实现为CharSequence
,这样您就可以包装您的大型文本文件。为什么?因为从Pattern
构建Matcher
需要CharSequence
作为论据。
https://github.com/fge/largetext
更新:
我尝试实现Wiktor的解决方案:
"^#ifdef "+patterns.get(i)+"((?:\r?\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\r?\n#endif (?:// end of )?"+patterns.get(i)+"$"
它只捕获第二个块,但不捕获以下块:
#ifdef 功能
垃圾捕获的文本
#endif//功能结束
无论如何,当我运行罐子时仍然溢出。