Java按字符读取文本文件 - unicode 254和255突然出现 - Java reading a text file by character - unicode 254 and 255 appears out of nowhere 小贝子编程网

从下面的代码中，我读取了一个包含字符"a"的文本文件（unicode 97）

int ini ; 
    // Buffered Reader Text file read per character
    while((ini=jer.read())!=(-1)){
        char inp = (char)ini;
        System.out.println(inp);
        if (listahan.containsKey(inp)) {
                listahan.put(inp,listahan.get(inp) + 1);
            } else {
                listahan.put(inp, 1);
            }
    }
// ENHANCED FOR LOOP FOR DISPLAYING IN CONSOLE
for (Map.Entry<Character, Integer> e : listahan.entrySet()){
    System.out.printf("%1d.) %-15s : %-3d%n", ctr++, e.getKey(), e.getValue());
}

输出为：

1.)                 : 1  // (must be a null)
2.) a               : 1  
3.) þ               : 1  
4.) ÿ               : 1

为什么输出不是这样的？

1.) a                 :1

你遇到了一个字节顺序标记，即 U+FEFF，当作为单独的字节读取时，相当于 254 和 255。

这（连同空值的出现）可能意味着文件以 UTF-16 或 UCS-2（又名宽字符串、wchar 等）编码。我建议你阅读一下Unicode编码，如果你不知道这意味着什么。为此，我推荐一篇伟大的文章绝对最低限度每个软件开发人员绝对，肯定必须了解 Unicode 和字符集（没有借口！

Java按字符读取文本文件 - unicode 254和255突然出现

相关内容

最新更新

热门标签：