我正在创建一个霍夫曼树来压缩文本文件,但我遇到了一些问题。我所做的这个方法应该采用一个FileInputStream
,它输入文本数据并返回字符和计数的Map
。然而,要做到这一点,我需要定义byte[]
的大小来存储数据。问题是byte[]
数组的大小需要正好合适的长度,否则Map
也会有一些不需要的数据。有没有办法使byte[]
正好合适的大小?
// provides a count of characters in an input file and place in map
public static Map<Character, Integer> getCounts(FileInputStream input)
throws IOException {
Map<Character, Integer> output = new TreeMap<Character, Integer>(); // treemap keeps keys in sorted order (chars alphabetized)
byte[] fileContent = new byte[100]; // creates a byte[]
//ArrayList<Byte> test = new ArrayList<Byte>();
input.read(fileContent); // reads the input into fileContent
String test = new String(fileContent); // contains entire file into this string to process
// goes through each character of String to put chars as keys and occurrences as keys
for (int i = 0; i < test.length(); i++) {
char temp = test.charAt(i);
if (output.containsKey(temp)) { // seen this character before; increase count
int count = output.get(temp);
System.out.println("repeat; char is: " + temp + "count is: " + count);
output.put(temp, count + 1);
} else { // Haven't seen this character before; create count of 1
System.out.println("new; char is: " + temp + "count is: 1");
output.put(temp, 1);
}
}
return output;
}
FileInputStream.read()
的返回值是实际读取的字节数,在EOF的情况下是-1
。您可以在for循环中使用此值而不是test.length()
。
注意,read()
不能保证读取缓冲区长度的字节值,即使没有到达文件的末尾,所以它通常在循环中使用:
int bytesRead;
//Read until there is no more bytes to read.
while((bytesRead = input.read(buf))!=-1)
{
//You have next bytesRead bytes in a buffer here
}
最后,如果字符串是Unicode,这种方法将不起作用,因为read()
可以在字符中间终止。考虑使用InputStreamReader
来包装FileInputStream
:
Reader fileReader = new InputStreamReader(input, "UTF-8");
int charsRead;
char buf[] = new char[256];
while ((charsRead = fileReader.read(buf)) > 0) {
//You have charsRead characters in a buffer here
}