有没有像Buffer.LastPositionOf这样的东西?在缓冲区中查找字符的最后一个出现?



我有一个类型为ReadOnlySequence<byte>的缓冲区。我想通过知道每条消息都以0x1c, 0x0d结尾(如此处所述)从中提取一个子序列(将包含 0 - n 条消息)。

我知道缓冲区有一个扩展方法 PositionOf 但它

返回itemReadOnlySequence<T>.

我正在寻找一种向我返回最后一次出现的位置的方法。我试图自己实现它,这就是我到目前为止所拥有的

private SequencePosition? GetLastPosition(ReadOnlySequence<byte> buffer)
{
// Do not modify the real buffer
ReadOnlySequence<byte> temporaryBuffer = buffer;
SequencePosition? lastPosition = null;
do
{
/*
Find the first occurence of the delimiters in the buffer
This only takes a byte, what to do with the delimiters? { 0x1c, 0x0d }
*/
SequencePosition? foundPosition = temporaryBuffer.PositionOf(???);
// Is there still an occurence?
if (foundPosition != null)
{
lastPosition = foundPosition;
// cut off the sequence for the next run
temporaryBuffer = temporaryBuffer.Slice(0, lastPosition.Value);
}
else
{
// this is required because otherwise this loop is infinite if lastPosition was set once
break;
}
} while (lastPosition != null);
return lastPosition;
}

我正在为此苦苦挣扎。首先,PositionOf方法只需要一个byte但有两个分隔符,所以我必须传入一个byte[]。接下来,我想我可以"以某种方式"优化循环。

您知道如何找到这些分隔符的最后一次出现吗?

我掉进了一个巨大的兔子洞里挖这个,但我设法想出了一个扩展方法,我认为它回答了你的问题:

using System;
using System.Buffers;
using System.Collections.Generic;
using System.Linq;
public static class ReadOnlySequenceExtensions
{
public static SequencePosition? LastPositionOf(
this ReadOnlySequence<byte> source,
byte[] delimiter)
{
if (delimiter == null)
{
throw new ArgumentNullException(nameof(delimiter));
}
if (!delimiter.Any())
{
throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
}
var reader = new SequenceReader<byte>(source);
var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
var delimiterFound = false;
// Keep reading until we've consumed all delimiters
while (reader.TryReadTo(out _, delimiterToFind, true))
{
delimiterFound = true;
}
if (!delimiterFound)
{
return null;
}
// If we got this far, we've consumed bytes up to,
// and including, the last byte of the delimiter,
// so we can use that to get the position of 
// the starting byte of the delimiter
return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
}
}

这里也有一些测试用例:

var cases = new List<byte[]>
{
// Case 1: Check an empty array
new byte[0],
// Case 2: Check an array with no delimiter
new byte[] { 0xf },
// Case 3: Check an array with part of the delimiter
new byte[] { 0x1c },
// Case 4: Check an array with the other part of the delimiter
new byte[] { 0x0d },
// Case 5: Check an array with the delimiter in the wrong order
new byte[] { 0x0d, 0x1c },
// Case 6: Check an array with a correct delimiter
new byte[] { 0x1c, 0x0d },
// Case 7: Check an array with a byte followed by a correct delimiter
new byte[] { 0x1, 0x1c, 0x0d },
// Case 8: Check an array with multiple correct delimiters
new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d },
// Case 9: Check an array with multiple correct delimiters
// where the delimiter isn't the last byte
new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d, 0x3 },
// Case 10: Check an array with multiple sequential bytes of a delimiter
new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x1c, 0x0d, 0x3 },
};
var delimiter = new byte[] { 0x1c, 0x0d };
foreach (var item in cases)
{
var source = new ReadOnlySequence<byte>(item);
var result = source.LastPositionOf(delimiter);
} // Put a breakpoint here and examine result

15都正确返回null的情况。610的情况都正确地将SequencePosition返回到分隔符中的第一个字节(即在本例中为0x1c)。

我还尝试创建一个迭代版本,该版本将在找到分隔符后产生一个位置,如下所示:

while (reader.TryReadTo(out _, delimiterToFind, true))
{
yield return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
}

但是SequenceReader<T>ReadOnlySpan<T>不能在迭代器块中使用,所以我想出了AllPositionsOf

public static IEnumerable<SequencePosition> AllPositionsOf(
this ReadOnlySequence<byte> source,
byte[] delimiter)
{
if (delimiter == null)
{
throw new ArgumentNullException(nameof(delimiter));
}
if (!delimiter.Any())
{
throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
}
var reader = new SequenceReader<byte>(source);
var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
var results = new List<SequencePosition>();
while (reader.TryReadTo(out _, delimiterToFind, true))
{
results.Add(reader.Sequence.GetPosition(reader.Consumed - delimiter.Length));
}
return results;
}

测试用例也可以正常工作。

更新

现在我已经睡了一觉,并有机会思考事情,我认为上述内容可以改进,原因如下:

  1. SequenceReader<T>有一个Rewind()的方法,这让我认为SequenceReader<T>被设计为可重用
  2. SequenceReader<T>似乎旨在使一般情况下更容易使用ReadOnlySequence<T>
  3. ReadOnlySequence<T>上创建扩展方法以使用SequenceReader<T>ReadOnlySequence<T>读取似乎向后

鉴于上述情况,我认为在可能的情况下尽量避免直接使用ReadOnlySequence<T>s,而是首选和重用SequenceReader<T>可能更有意义。因此,考虑到这一点,这里有一个不同版本的LastPositionOf,现在是SequenceReader<T>上的扩展方法:

public static class SequenceReaderExtensions
{
/// <summary>
/// Finds the last occurrence of a delimiter in a given sequence.
/// </summary>
/// <param name="reader">The reader to read from.</param>
/// <param name="delimiter">The delimeter to look for.</param>
/// <param name="rewind">If true, rewinds the reader to its position prior to this method being called.</param>
/// <returns>A SequencePosition if a delimiter is found, otherwise null.</returns>
public static SequencePosition? LastPositionOf(
this ref SequenceReader<byte> reader,
byte[] delimiter,
bool rewind)
{
if (delimiter == null)
{
throw new ArgumentNullException(nameof(delimiter));
}
if (!delimiter.Any())
{
throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
}
var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
var consumed = reader.Consumed;
var delimiterFound = false;
// Keep reading until we've consumed all delimiters
while (reader.TryReadTo(out _, delimiterToFind, true))
{
delimiterFound = true;
}
if (!delimiterFound)
{
if (rewind)
{
reader.Rewind(reader.Consumed - consumed);
}
return null;
}
// If we got this far, we've consumed bytes up to,
// and including, the last byte of the delimiter,
// so we can use that to get the starting byte
// of the delimiter
var result = reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
if (rewind)
{
reader.Rewind(reader.Consumed - consumed);
}
return result;
}
}

上述测试用例继续通过,但我们现在可以重用相同的reader。此外,它还允许您指定是否要在被调用之前回退到reader的原始位置。

最新更新