无需外部库即可取消转义 HTML 转义字符的有效方法



现在,如果我想将HTML转义字符转换为可读String,我有这个方法:

 public static String unescapeHTML(String text) {
        return text
                .replace("™", "™")
                .replace("€", "€")
                .replace(" ", " ")
                .replace(" ", " ")
                .replace("!", "!")
                .replace(""", """)
                .replace(""", """)
                .replace("#", "#")
                .replace("$", "$")
                .replace("%", "%")
                .replace("&", "&")
                //and the rest of HTML escape characters
                .replace("&", "&");
 }

我的目标不是使用任何外部库,如 Apache ( class StringUtils ) 等。因为列表很长 - 超过 300 个字符 - 很高兴知道替换它们的最快方法是什么?

使用模式和匹配器。 如果你想避免对缓冲区长度的计算/调整,你也可以在某些数据结构中保留两个字符串之间的差异,并使用它而不是在运行时计算缓冲区长度。 像 { -4,-4,0,-4} .由于缓冲区长度只是返回实例变量,因此我在这里使用了缓冲区长度。

private final static Pattern MY_PATTERN = Pattern.compile("\&(.*?)\;");
    private final static HashMap<String, String> patterns = new HashMap<>();
    static{
        patterns.put("&amp;", "&");
        patterns.put("&#33;", "!");
        patterns.put("&#32;", "thick");
        patterns.put("&#36;", "$");
    }
    public static StringBuffer escapeString(String text){
        StringBuffer buffer = new StringBuffer(text);
        Matcher m = MY_PATTERN.matcher(text);
        int modifiedLength = 0;
        while (m.find()) {
            int tmpLength = buffer.length();
                    // To consider the modified buffer length due to replace. hold difference between old and previous
            buffer.replace(m.start()-modifiedLength, m.end()-modifiedLength, patterns.get(m.group())); 
            modifiedLength = modifiedLength + tmpLength-buffer.length();
        }
        return buffer;
    }

我决定这样做:

    private static final Map<Integer, Character> iMap = new HashMap<>();
    static {//Code, like &#32; or &#032;
        iMap.put(32, ' ');
        iMap.put(33, '!');
        iMap.put(34, '"');
        iMap.put(35, '#');
        iMap.put(36, '$');
        iMap.put(37, '%');
        iMap.put(38, '&');
        //...
    }
    private static final Map<String, Character> sMap = new HashMap<>();
    static {//Entity Name
        sMap.put("&larr;", '←');
        sMap.put("&uarr;", '↑');
        sMap.put("&rarr;", '→');
        sMap.put("&darr;", '↓');
        sMap.put("&harr;", '↔');
        sMap.put("&spades;", '♠');
        sMap.put("&clubs;", '♣');
        sMap.put("&hearts;", '♥');
        //...
    }
    public static String unescapeHTML(String str) {
        StringBuilder sb = new StringBuilder(),
                tmp = new StringBuilder();
        StringReader sr = new StringReader(str);
        boolean esc = false;
        try {
            int i;
            while ((i = sr.read()) != -1) {
                char c = (char) i;
                if (c == '&') {
                    tmp.append(c);
                    esc = true;
                } else if (esc) {
                    tmp.append(c);
                    if (c == ';') {
                        esc = false;
                        if (tmp.charAt(1) == '#') {
                            try {
                                sb.append(iMap.get(Integer.parseInt(tmp.substring(2, tmp.capacity() - 1))));
                            } catch (NumberFormatException ex) {
                                sb.append(tmp.toString());//Ignore and leave unchanged
                            }
                        } else {
                            sb.append(sMap.get(tmp.toString()));
                        }
                        tmp.setLength(0);
                    }
                } else {
                    sb.append(c);
                }
            }
        sr.close();
        } catch (IOException ex) {
            Logger.getLogger(UnescapeHTML.class.getName()).log(Level.SEVERE, null, ex);
        }
        return sb.toString();
    }

完美运行,代码简单。仍在测试中。很高兴听到您的评论。

最新更新