匹配器.追加替换未添加起始内容

  • 本文关键字:添加 追加 替换 java regex
  • 更新时间 :
  • 英文 :

public class TestUtil {

    public static void main(String[] args) {
        StringBuffer test = new StringBuffer(); 
        test.append("abacbsidfslhfadskljfhdskh adsfkjlhdslkfhas lkajdsfhak dsjfhs akhasdf adsjkfh asldjkfhds glakdshgf dghkads ghklgh asdflkghadfkl <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide"><term xml:lang="la">Homo</term></hi> </hi>. <term xml:lang="la">Erectus</term>, <term xml:lang="la">bimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>");
        test.append("bbbbbbbbbbbbbbbbbbbbbb <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide"><term xml:lang="la">sHomo</term></hi> </hi>. <term xml:lang="la">sErectus</term>, <term xml:lang="la">sbimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>");
        Pattern pattern = Pattern.compile("<p rendition="#indent-1">\d+\.\s*.*?</p>",
                    Pattern.CASE_INSENSITIVE);
        Matcher regexMatcher = pattern.matcher(test.toString());
        System.out.println(test);
        test.delete(0, test.length());
        while (regexMatcher.find()) {
            //  test.delete(regexMatcher.start(),test.length());
                String matched =regexMatcher.group(0);
                Pattern termPatter=Pattern.compile("(<term xml:lang=".*?")(>)(.*?)(</term>)");
                Matcher termMatcher = termPatter.matcher(matched);
                if(termMatcher != null){
                    //termMatcher.start();
                    System.out.println(termMatcher.groupCount());
                    while (termMatcher.find()) {
                        System.out.println("0---"+termMatcher.group(0));
                        System.out.println(termMatcher.group(1));
                        System.out.println(termMatcher.group(2));
                        System.out.println(termMatcher.group(3));
                        System.out.println(termMatcher.group(4));
                        termMatcher.appendReplacement(test, appendSortKey(termMatcher.group(0),termMatcher.group(1),termMatcher.group(2),termMatcher.group(3),termMatcher.group(4)));
                    }
                    termMatcher.appendTail(test);
                }
                //regexMatcher.appendTail(test);
        }
        System.out.println(test);
    }
    private static String appendSortKey(String totStr, String termStart, String termStartEndTag, String termValue, String termEndTag) {
        // TODO Auto-generated method stub
        if(totStr!=null){
            termStart = termStart+" "+"sortKey=""+termValue+"""+termStartEndTag;
            return termStart+termValue+termEndTag;
        }
        return null;
    }
}

试图只操纵<术语>..... 通过从另一个正则表达式的匹配器获取内容(因为它是条件(,但在开头和结尾丢失内容,请让我知道我正在犯的错误。

预期输出为

abacbsidfslhfadskljfhdskh adsfkjlhdslkfhas lkajdsfhak dsjfhs akhasdf adsjkfh asldjkfhds glakdshgf dghkads ghklgh asdflkghadfkl <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide"><term xml:lang="la" sortKey="Homo">Homo</term></hi> </hi>. <term xml:lang="la" sortKey="Erectus">Erectus</term>, <term xml:lang="la"sortKey="bimanus" >bimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>bbbbbbbbbbbbbbbbbbbbbb <p rendition="#indent-1">1. Geschl. <hi rendition="#r"> <hi rendition="#smcap"> <hi rendition="#wide">><term xml:lang="la" sortKey="sHomo">sHomo</term></hi> </hi>. <term xml:lang="la" sortKey="sErectus">sErectus</term>, <term xml:lang="la"sortKey="sbimanus" >sbimanus</term>. Mentoprominulo. Dentibus aequaliter approximatis; incisoribus inferioribus erectis.</hi> </p>

你改变代码的以下行,它会给你正确的结果。

Pattern pattern = Pattern.compile(".*<p rendition="#indent-1">\d+\.\s*.*?</p>",
                Pattern.CASE_INSENSITIVE);
/*
instead of the following
Pattern pattern = Pattern.compile("<p rendition="#indent-1">\d+\.\s*.*?</p>",
                Pattern.CASE_INSENSITIVE);
*/

解释:

  • <p rendition="#indent-1">\d+\.\s*.*?</p>零件<p ...> ... </p>零件匹配,因此 appendReplace 仅附加<p ...> ... </p>零件替换。
  • .*<p rendition="#indent-1">\d+\.\s*.*?</p>零件将匹配Text <p ...> ... </p>,因此在追加替换后,您将获得带有替换件的Text <p ...> ... </p>

因此,输出将是整个字符串,<term xml:lang="la">text</term>替换为<term xml:lang="la" sortKey="text">text</term>

最新更新