如何吃流出的字节



我正在修复一个ZIP库类。 在内部,几乎所有的ZIP实现都使用DEFLATE压缩(RFC1951)。

问题是,在 Delphi 中,我无法访问任何DEFLATE压缩库。但是我们有很多ZLIB压缩代码(RFC1950)。它甚至附带了Delphi,并且还有其他六个实现。

在内部,ZLIB也使用DEFLATE进行压缩。所以我想做每个人都做过的事情 - 使用 Delphi zlib 库来实现其 DEFLATE 压缩功能。

问题是 ZLIB 在放气数据中添加了一个 2 字节前缀和 4 字节尾部:

[CMF]                1 byte
[FLG]                1 byte
[...deflate compressed data...]
[Adler-32 checksum]  4 bytes

所以我需要的是一种使用标准TCompressionStream(或TZCompressionStream,或TZCompressionStreamEx,具体取决于您使用的源代码)流来压缩数据的方法:

procedure CompressDataToTargetStream(sourceStream: TStream; targetStream: TStream);
var
   compressor: TCompressionStream;
begin
   compressor := TCompressionStream.Create(clDefault, targetStream); //clDefault = CompressionLevel
   try
      compressor.CopyFrom(sourceStream, sourceStream.Length)
   finally
      compressor.Free; 
   end;
end;

这有效,除了它写出了前导 2 字节和尾随 4 字节;我需要去掉它们。

所以我写了一TByteEaterStream

TByteEaterStream = class(TStream)
public
   constructor Create(TargetStream: TStream; 
         LeadingBytesToEat, TrailingBytesToEat: Integer);
end;

例如

procedure CompressDataToTargetStream(sourceStream: TStream; targetStream: TStream);
var
   byteEaterStream: TByteEaterStream;
   compressor: TCompressionStream;
begin
   byteEaterStream := TByteEaterStream.Create(targetStream, 2, 4); //2 leading bytes, 4 trailing bytes
   try
      compressor := TCompressionStream.Create(clDefault, byteEaterStream); //clDefault = CompressionLevel
      try
         compressor.CopyFrom(sourceStream, sourceStream.Length)
      finally
         compressor.Free; 
      end;
   finally
      byteEaterStream.Free;
   end;
end;

此流将覆盖写入方法。吃掉前 2 个字节是微不足道的。诀窍是吃掉尾随的 4 字节。

吃者流有一个 4 字节数组,我总是在缓冲区中保存每次写入的最后四个字节。当 EaterStream 被销毁时,尾随的四个字节随之而去。

问题在于,通过此缓冲区随机处理几百万次写入会降低性能。上游的典型用途是:

for each of a million data rows
    stream.Write(s, Length(s)); //30-90 character string

我绝对不希望上游用户必须指出"末日即将到来"。 我只是希望它更快。

问题

看着一股字节流过,保留最后四个字节的最佳方法是什么;鉴于您不知道什么时候写入将是最后一个。

我正在修复的代码将整个压缩版本写入TStringStream,然后只抓取 900MB - 6 字节来获取内部 DEFLATE 数据:

cs := TStringStream.Create('');
....write compressed data to cs
S := Copy(CS.DataString, 3, Length(CS.DataString) - 6);

除非运行用户内存不足。最初我把它改成写成TFileStream,然后我可以执行同样的技巧。

但我想要更好的解决方案;流解决方案。 我希望数据被压缩到最终的流中,没有任何中间存储。

我的实现

并不是说它有任何帮助;因为我没有必要要求一个甚至使用适应流进行修剪的系统。

TByteEaterStream = class(TStream)
private
    FTargetStream: TStream;
    FTargetStreamOwnership: TStreamOwnership;
    FLeadingBytesToEat: Integer;
    FTrailingBytesToEat: Integer;
    FLeadingBytesRemaining: Integer;
    FBuffer: array of Byte;
    FValidBufferLength: Integer;
    function GetBufferValidLength: Integer;
public
    constructor Create(TargetStream: TStream; LeadingBytesToEat, TrailingBytesToEat: Integer; StreamOwnership: TStreamOwnership=soReference);
    destructor Destroy; override;
    class procedure SelfTest;
    procedure Flush;
    function Read(var Buffer; Count: Longint): Longint; override;
    function Write(const Buffer; Count: Longint): Longint; override;
    function Seek(Offset: Longint; Origin: Word): Longint; override;
end;
{ TByteEaterStream }
constructor TByteEaterStream.Create(TargetStream: TStream; LeadingBytesToEat, TrailingBytesToEat: Integer; StreamOwnership: TStreamOwnership=soReference);
begin
    inherited Create;
    //User requested state
    FTargetStream := TargetStream;
    FTargetStreamOwnership := StreamOwnership;
    FLeadingBytesToEat := LeadingBytesToEat;
    FTrailingBytesToEat := TrailingBytesToEat;
    //internal housekeeping
    FLeadingBytesRemaining := FLeadingBytesToEat;
    SetLength(FBuffer, FTrailingBytesToEat);
    FValidBufferLength := 0;
end;
destructor TByteEaterStream.Destroy;
begin
    if FTargetStreamOwnership = soOwned then
        FTargetStream.Free;
    FTargetStream := nil;
    inherited;
end;
procedure TByteEaterStream.Flush;
begin
    if FValidBufferLength > 0 then
    begin
        FTargetStream.Write(FBuffer[0], FValidBufferLength);
        FValidBufferLength  := 0;
    end;
end;
function TByteEaterStream.Write(const Buffer; Count: Integer): Longint;
var
    newStart: Pointer;
    totalCount: Integer;
    addIndex: Integer;
    bufferValidLength: Integer;
    bytesToWrite: Integer;
begin
    Result := Count;
    if Count = 0 then
        Exit;
    if FLeadingBytesRemaining > 0 then
    begin
        newStart := Addr(Buffer);
        Inc(Cardinal(newStart));
        Dec(Count);
        Dec(FLeadingBytesRemaining);
        Result := Self.Write(newStart^, Count)+1; //tell the upstream guy that we wrote it
        Exit;
    end;
    if FTrailingBytesToEat > 0 then
    begin
        if (Count < FTrailingBytesToEat) then
        begin
            //There's less bytes incoming than an entire buffer
            //But the buffer might overfloweth
            totalCount := FValidBufferLength+Count;
            //If it could all fit in the buffer, then let it
            if (totalCount <= FTrailingBytesToEat) then
            begin
                Move(Buffer, FBuffer[FValidBufferLength], Count);
                FValidBufferLength := totalCount;
            end
            else
            begin
                //We're going to overflow the buffer.
                //Purge from the buffer the amount that would get pushed
                FTargetStream.Write(FBuffer[0], totalCount-FTrailingBytesToEat);
                //Shuffle the buffer down (overlapped move)
                bufferValidLength := bufferValidLength - (totalCount-FTrailingBytesToEat);
                Move(FBuffer[totalCount-FTrailingBytesToEat], FBuffer[0], bufferValidLength);
                addIndex := bufferValidLength ; //where we will add the data to
                Move(Buffer, FBuffer[addIndex], Count);
            end;
        end
        else if (Count = FTrailingBytesToEat) then
        begin
            //The incoming bytes exactly fill the buffer. Flush what we have and eat the incoming amounts
            Flush;
            Move(Buffer, FBuffer[0], FTrailingBytesToEat);
            FValidBufferLength := FTrailingBytesToEat;
            Result := FTrailingBytesToEat; //we "wrote" n bytes
        end
        else
        begin
            //Count is greater than trailing buffer eat size
            Flush;
            //Write the data that definitely not to be eaten
            bytesToWrite := Count-FTrailingBytesToEat;
            FTargetStream.Write(Buffer, bytesToWrite);
            //Buffer the remainder
            newStart := Addr(Buffer);
            Inc(Cardinal(newStart), bytesToWrite);
            Move(newStart^, FBuffer[0], FTrailingBytesToEat);
            FValidBufferLength := 4;
        end;
    end;
end;
function TByteEaterStream.Seek(Offset: Integer; Origin: Word): Longint;
begin
    //what does it mean if they want to seek around when i'm supposed to be eating data?
    //i don't know; so results are, by definition, undefined. Don't use at your own risk
    Result := FTargetStream.Seek(Offset, Origin);
end;
function TByteEaterStream.Read(var Buffer; Count: Integer): Longint;
begin
    //what does it mean if they want to read back bytes when i'm supposed to be eating data?
    //i don't know; so results are, by definition, undefined. Don't use at your own risk
    Result := FTargetStream.Read({var}Buffer, Count);
end;
class procedure TByteEaterStream.SelfTest;
    procedure CheckEquals(Expected, Actual: string; Message: string);
    begin
        if Actual <> Expected then
            raise Exception.CreateFmt('TByteEaterStream self-test failed. Expected "%s", but was "%s". Message: %s', [Expected, Actual, Message]);
    end;
    procedure Test(const InputString: string; ExpectedString: string);
    var
        s: TStringStream;
        eater: TByteEaterStream;
    begin
        s := TStringStream.Create('');
        try
            eater := TByteEaterStream.Create(s, 2, 4, soReference);
            try
                eater.Write(InputString[1], Length(InputString));
            finally
                eater.Free;
            end;
            CheckEquals(ExpectedString, s.DataString, InputString);
        finally
            s.Free;
        end;
    end;
begin
    Test('1', '');
    Test('11', '');
    Test('113', '');
    Test('1133', '');
    Test('11333', '');
    Test('113333', '');
    Test('11H3333', 'H');
    Test('11He3333', 'He');
    Test('11Hel3333', 'Hel');
    Test('11Hell3333', 'Hell');
    Test('11Hello3333', 'Hello');
    Test('11Hello,3333', 'Hello,');
    Test('11Hello, 3333', 'Hello, ');
    Test('11Hello, W3333', 'Hello, W');
    Test('11Hello, Wo3333', 'Hello, Wo');
    Test('11Hello, Wor3333', 'Hello, Wor');
    Test('11Hello, Worl3333', 'Hello, Worl');
    Test('11Hello, World3333', 'Hello, World');
    Test('11Hello, World!3333', 'Hello, World!');
end;

整个问题可以通过简单地要求 zlib 不包装放气流来避免。 我在问题的代码中没有看到 zlib 的接口,但是某处使用 deflateInit()deflateInit2() 进行初始化. 如果使用 deflateInit2() ,则可以为 windowBits 参数提供-15而不是15,以请求未包装的 deflate 输出。

您需要推迟写入,直到您确定要写入的字节不是必须吃掉的尾随字节。这一观察结果使您认为缓冲将提供解决方案。

所以,我建议这样做:

    使用
  1. 使用缓冲的流适配器。
  2. 吃掉前导字节很容易。刚刚将前两个字节遗忘。
  3. 在该缓冲区之后,要写入的字节,当需要刷新时,刷新缓冲区中除最后四个字节之外的所有字节。
  4. 刷新时,将
  5. 未刷新的四个字节复制到缓冲区的开头,以免丢失它们。
  6. 关闭流时,请
  7. 刷新它,就像对缓冲流一样。并使用与以前相同的刷新技术,以便保留最后四个字节。此时,您知道这些是流的最后四个字节。

上述方法要求的一个要求是缓冲区的大小必须大于要剥离的尾随字节数。

相关内容

  • 没有找到相关文章

最新更新