public class TestUtil {
public static void main(String[] args) {
StringBuffer test = new StringBuffer();
test.append("abacbsidfslhfadskljfhdskh adsfkjlhdslkfhas lkajdsfhak dsjfhs akhasdf adsjkfh asldjkfhds glakdshgf dghkads ghklgh asdflkghadfkl <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide"><term xml:lang="la">Homo</term></hi> </hi>. <term xml:lang="la">Erectus</term>, <term xml:lang="la">bimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>");
test.append("bbbbbbbbbbbbbbbbbbbbbb <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide"><term xml:lang="la">sHomo</term></hi> </hi>. <term xml:lang="la">sErectus</term>, <term xml:lang="la">sbimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>");
Pattern pattern = Pattern.compile("<p rendition="#indent-1">\d+\.\s*.*?</p>",
Pattern.CASE_INSENSITIVE);
Matcher regexMatcher = pattern.matcher(test.toString());
System.out.println(test);
test.delete(0, test.length());
while (regexMatcher.find()) {
// test.delete(regexMatcher.start(),test.length());
String matched =regexMatcher.group(0);
Pattern termPatter=Pattern.compile("(<term xml:lang=".*?")(>)(.*?)(</term>)");
Matcher termMatcher = termPatter.matcher(matched);
if(termMatcher != null){
//termMatcher.start();
System.out.println(termMatcher.groupCount());
while (termMatcher.find()) {
System.out.println("0---"+termMatcher.group(0));
System.out.println(termMatcher.group(1));
System.out.println(termMatcher.group(2));
System.out.println(termMatcher.group(3));
System.out.println(termMatcher.group(4));
termMatcher.appendReplacement(test, appendSortKey(termMatcher.group(0),termMatcher.group(1),termMatcher.group(2),termMatcher.group(3),termMatcher.group(4)));
}
termMatcher.appendTail(test);
}
//regexMatcher.appendTail(test);
}
System.out.println(test);
}
private static String appendSortKey(String totStr, String termStart, String termStartEndTag, String termValue, String termEndTag) {
// TODO Auto-generated method stub
if(totStr!=null){
termStart = termStart+" "+"sortKey=""+termValue+"""+termStartEndTag;
return termStart+termValue+termEndTag;
}
return null;
}
}
试图只操纵<术语>..... 通过从另一个正则表达式的匹配器获取内容(因为它是条件(,但在开头和结尾丢失内容,请让我知道我正在犯的错误。术语>
预期输出为
abacbsidfslhfadskljfhdskh adsfkjlhdslkfhas lkajdsfhak dsjfhs akhasdf adsjkfh asldjkfhds glakdshgf dghkads ghklgh asdflkghadfkl <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide"><term xml:lang="la" sortKey="Homo">Homo</term></hi> </hi>. <term xml:lang="la" sortKey="Erectus">Erectus</term>, <term xml:lang="la"sortKey="bimanus" >bimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>bbbbbbbbbbbbbbbbbbbbbb <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide">><term xml:lang="la" sortKey="sHomo">sHomo</term></hi> </hi>. <term xml:lang="la" sortKey="sErectus">sErectus</term>, <term xml:lang="la"sortKey="sbimanus" >sbimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>
你改变代码的以下行,它会给你正确的结果。
Pattern pattern = Pattern.compile(".*<p rendition="#indent-1">\d+\.\s*.*?</p>",
Pattern.CASE_INSENSITIVE);
/*
instead of the following
Pattern pattern = Pattern.compile("<p rendition="#indent-1">\d+\.\s*.*?</p>",
Pattern.CASE_INSENSITIVE);
*/
解释:
-
<p rendition="#indent-1">\d+\.\s*.*?</p>
零件<p ...> ... </p>
零件匹配,因此 appendReplace 仅附加<p ...> ... </p>
零件替换。 -
.*<p rendition="#indent-1">\d+\.\s*.*?</p>
零件将匹配Text <p ...> ... </p>
,因此在追加替换后,您将获得带有替换件的Text <p ...> ... </p>
。
因此,输出将是整个字符串,<term xml:lang="la">text</term>
替换为<term xml:lang="la" sortKey="text">text</term>