Create StringBuilder from byte[]



有没有办法从byte[]创建StringBuilder

我想使用 StringBuilder 提高内存使用率,但我首先拥有的是 byte[] ,所以我必须从byte[]创建一个String,然后从String创建StringBuilder,我不认为这个解决方案是最佳的。

谢谢

基本上,你最好的选择似乎是直接使用CharsetDecoder。

方法如下:

byte[] srcBytes = getYourSrcBytes();
//Whatever charset your bytes are endoded in
Charset charset = Charset.forName("UTF-8");
CharsetDecoder decoder = charset.newDecoder();
//ByteBuffer.wrap simply wraps the byte array, it does not allocate new memory for it
ByteBuffer srcBuffer = ByteBuffer.wrap(srcBytes);
//Now, we decode our srcBuffer into a new CharBuffer (yes, new memory allocated here, no can do)
CharBuffer resBuffer = decoder.decode(srcBuffer);
//CharBuffer implements CharSequence interface, which StringBuilder fully support in it's methods
StringBuilder yourStringBuilder = new StringBuilder(resBuffer);

添加:

经过一些测试,似乎简单的new String(bytes)要快得多,似乎没有简单的方法可以使其更快。这是我运行的测试:

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.text.ParseException;
public class ConsoleMain {
    public static void main(String[] args) throws IOException, ParseException {
        StringBuilder sb1 = new StringBuilder("abcdefghijklmnopqrstuvwxyz");
        for (int i=0;i<19;i++) {
            sb1.append(sb1);
        }
        System.out.println("Size of buffer: "+sb1.length());
        byte[] src = sb1.toString().getBytes("UTF-8");
        StringBuilder res;
        long startTime = System.currentTimeMillis();
        res = testStringConvert(src);
        System.out.println("Conversion using String time (msec): "+(System.currentTimeMillis()-startTime));
        if (!res.toString().equals(sb1.toString())) {
            System.err.println("Conversion error");
        }
        startTime = System.currentTimeMillis();
        res = testCBConvert(src);
        System.out.println("Conversion using CharBuffer time (msec): "+(System.currentTimeMillis()-startTime));
        if (!res.toString().equals(sb1.toString())) {
            System.err.println("Conversion error");
        }
    }
    private static StringBuilder testStringConvert(byte[] src) throws UnsupportedEncodingException {
        String s = new String(src, "UTF-8");
        StringBuilder b = new StringBuilder(s);
        return b;
    }
    private static StringBuilder testCBConvert(byte[] src) throws CharacterCodingException {
        Charset charset = Charset.forName("UTF-8");
        CharsetDecoder decoder = charset.newDecoder();
        ByteBuffer srcBuffer = ByteBuffer.wrap(src);
        CharBuffer resBuffer = decoder.decode(srcBuffer);
        StringBuilder b = new StringBuilder(resBuffer);
        return b;
    }
}

结果:

Size of buffer: 13631488
Conversion using String time (msec): 91
Conversion using CharBuffer time (msec): 252

以及IDEONE上的修改(较少占用内存)版本:这里。

如果你想要的是简短的语句,那么就没有办法绕过两者之间的字符串步骤。为了方便起见,String 构造函数在非常常见的情况下混合了转换和对象构造,但对于 StringBuilder 来说,没有这种方便的构造函数。

如果这是你感兴趣的性能,那么你可以通过使用如下的东西来避免中间的 String 对象:

new StringBuilder(Charset.forName(charsetName).decode(ByteBuffer.wrap(inBytes)))

如果您希望能够微调性能,您可以自己控制解码过程。例如,您可能希望避免使用过多内存,方法是使用 averageCharsPerByte 来估计所需的内存量。如果估计值太短,则无需调整缓冲区的大小,而是可以使用生成的 StringBuilder 来累积所有部分。

CharsetDecoder cd = Charset.forName(charsetName).newDecoder();
cd.onMalformedInput(CodingErrorAction.REPLACE);
cd.onUnmappableCharacter(CodingErrorAction.REPLACE);
int lengthEstimate = Math.ceil(cd.averageCharsPerByte()*inBytes.length) + 1;
ByteBuffer inBuf = ByteBuffer.wrap(inBytes);
CharBuffer outBuf = CharBuffer.allocate(lengthEstimate);
StringBuilder out = new StringBuilder(lengthEstimate);
CoderResult cr;
while (true) {
    cr = cd.decode(inBuf, outBuf, true);
    out.append(outBuf);
    outBuf.clear();
    if (cr.isUnderflow()) break;
    if (!cr.isOverflow()) cr.throwException();
}
cr = cd.flush(outBuf);
if (!cr.isUnderflow()) cr.throwException();
out.append(outBuf);

不过,我怀疑上面的代码在大多数应用程序中是否值得付出努力。如果应用程序对性能感兴趣,它可能也不应该处理 StringBuilder,而是在缓冲区级别处理所有内容。

最新更新