我正在用Java为一种支持传统C风格块引号的语言编写一个预处理器,格式为:
/* Block quote on a single line. */
或:
/*
Block quote on multiple lines.
*/
预处理器的第一个任务是将这些块引号"折叠"为零长度字符串,但保留引号中的任何换行符。这是必要的,这样一来,任何编译错误都可以让程序员回到原始(预处理!(源代码中的行号。
现在,我的预处理器可以通过Java模式匹配找到并折叠块引号:
String open = “/\*”; // the initial slash-star
String body = “(.*?)”; // any char, zero or more, reluctantly
String close = “\*/“; // the ending star-slash
Pattern p = Pattern.compile(open + body + close, Pattern.DOTALL);
Matcher m = p.matcher(sourceCode);
sourceCode = m.replaceAll(“”);
但是,这种方法并没有保留行号——任何多行块引号都会折叠为一行。因此:
1: /*
2: Multiline block quote.
3: */
4:
5: println(“I was coded on line 5.”);
将预处理为:
1:
2:
3: println(“I was coded on line 5.”);
但最终应该是:
1:
2:
3:
4:
5: println(“I was coded on line 5.”);
有没有一种方法可以在保留行号的同时折叠块引号?
最后,我制作了一个自定义方法:
//compressBlockComments(String)
private String compressBlockComments(String source) {
Matcher m = Pattern.compile("/\*[\s\S]*?\*/").matcher(source);
StringBuffer sb = new StringBuffer(source);
int offset = 0; //keep track of how many chars have been removed
while (m.find()) { //Loop through pattern matches
String blockComment = m.group();
IntStream onlyNewlines = blockComment.chars().filter(c -> c == 'n');
int countOfNewlines = (int)onlyNewlines.count();
String compressed = "n".repeat(countOfNewlines); //empty string if count == 0
//Replace the block comment with a compressed string, offsetting
//to account for the number of chars already removed.
sb = sb.replace(m.start() - offset, m.end() - offset, compressed);
//Increment offset with the number of chars removed in this iteration.
offset += blockComment.length() - countOfNewlines;
} //while
return sb.toString();
}
这在我对具有不同长度和内容的块引号的源文件的测试中被证明是有效的。