我有一个类型为ReadOnlySequence<byte>
的缓冲区。我想通过知道每条消息都以0x1c, 0x0d
结尾(如此处所述)从中提取一个子序列(将包含 0 - n 条消息)。
我知道缓冲区有一个扩展方法 PositionOf 但它
返回
item
在ReadOnlySequence<T>
.
我正在寻找一种向我返回最后一次出现的位置的方法。我试图自己实现它,这就是我到目前为止所拥有的
private SequencePosition? GetLastPosition(ReadOnlySequence<byte> buffer)
{
// Do not modify the real buffer
ReadOnlySequence<byte> temporaryBuffer = buffer;
SequencePosition? lastPosition = null;
do
{
/*
Find the first occurence of the delimiters in the buffer
This only takes a byte, what to do with the delimiters? { 0x1c, 0x0d }
*/
SequencePosition? foundPosition = temporaryBuffer.PositionOf(???);
// Is there still an occurence?
if (foundPosition != null)
{
lastPosition = foundPosition;
// cut off the sequence for the next run
temporaryBuffer = temporaryBuffer.Slice(0, lastPosition.Value);
}
else
{
// this is required because otherwise this loop is infinite if lastPosition was set once
break;
}
} while (lastPosition != null);
return lastPosition;
}
我正在为此苦苦挣扎。首先,PositionOf
方法只需要一个byte
但有两个分隔符,所以我必须传入一个byte[]
。接下来,我想我可以"以某种方式"优化循环。
您知道如何找到这些分隔符的最后一次出现吗?
我掉进了一个巨大的兔子洞里挖这个,但我设法想出了一个扩展方法,我认为它回答了你的问题:
using System;
using System.Buffers;
using System.Collections.Generic;
using System.Linq;
public static class ReadOnlySequenceExtensions
{
public static SequencePosition? LastPositionOf(
this ReadOnlySequence<byte> source,
byte[] delimiter)
{
if (delimiter == null)
{
throw new ArgumentNullException(nameof(delimiter));
}
if (!delimiter.Any())
{
throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
}
var reader = new SequenceReader<byte>(source);
var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
var delimiterFound = false;
// Keep reading until we've consumed all delimiters
while (reader.TryReadTo(out _, delimiterToFind, true))
{
delimiterFound = true;
}
if (!delimiterFound)
{
return null;
}
// If we got this far, we've consumed bytes up to,
// and including, the last byte of the delimiter,
// so we can use that to get the position of
// the starting byte of the delimiter
return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
}
}
这里也有一些测试用例:
var cases = new List<byte[]>
{
// Case 1: Check an empty array
new byte[0],
// Case 2: Check an array with no delimiter
new byte[] { 0xf },
// Case 3: Check an array with part of the delimiter
new byte[] { 0x1c },
// Case 4: Check an array with the other part of the delimiter
new byte[] { 0x0d },
// Case 5: Check an array with the delimiter in the wrong order
new byte[] { 0x0d, 0x1c },
// Case 6: Check an array with a correct delimiter
new byte[] { 0x1c, 0x0d },
// Case 7: Check an array with a byte followed by a correct delimiter
new byte[] { 0x1, 0x1c, 0x0d },
// Case 8: Check an array with multiple correct delimiters
new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d },
// Case 9: Check an array with multiple correct delimiters
// where the delimiter isn't the last byte
new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x0d, 0x3 },
// Case 10: Check an array with multiple sequential bytes of a delimiter
new byte[] { 0x1, 0x1c, 0x0d, 0x2, 0x1c, 0x1c, 0x0d, 0x3 },
};
var delimiter = new byte[] { 0x1c, 0x0d };
foreach (var item in cases)
{
var source = new ReadOnlySequence<byte>(item);
var result = source.LastPositionOf(delimiter);
} // Put a breakpoint here and examine result
1
5
都正确返回null
的情况。6
10
的情况都正确地将SequencePosition
返回到分隔符中的第一个字节(即在本例中为0x1c
)。
我还尝试创建一个迭代版本,该版本将在找到分隔符后产生一个位置,如下所示:
while (reader.TryReadTo(out _, delimiterToFind, true))
{
yield return reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
}
但是SequenceReader<T>
和ReadOnlySpan<T>
不能在迭代器块中使用,所以我想出了AllPositionsOf
:
public static IEnumerable<SequencePosition> AllPositionsOf(
this ReadOnlySequence<byte> source,
byte[] delimiter)
{
if (delimiter == null)
{
throw new ArgumentNullException(nameof(delimiter));
}
if (!delimiter.Any())
{
throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
}
var reader = new SequenceReader<byte>(source);
var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
var results = new List<SequencePosition>();
while (reader.TryReadTo(out _, delimiterToFind, true))
{
results.Add(reader.Sequence.GetPosition(reader.Consumed - delimiter.Length));
}
return results;
}
测试用例也可以正常工作。
更新
现在我已经睡了一觉,并有机会思考事情,我认为上述内容可以改进,原因如下:
SequenceReader<T>
有一个Rewind()
的方法,这让我认为SequenceReader<T>
被设计为可重用SequenceReader<T>
似乎旨在使一般情况下更容易使用ReadOnlySequence<T>
- 在
ReadOnlySequence<T>
上创建扩展方法以使用SequenceReader<T>
从ReadOnlySequence<T>
读取似乎向后
鉴于上述情况,我认为在可能的情况下尽量避免直接使用ReadOnlySequence<T>
s,而是首选和重用SequenceReader<T>
可能更有意义。因此,考虑到这一点,这里有一个不同版本的LastPositionOf
,现在是SequenceReader<T>
上的扩展方法:
public static class SequenceReaderExtensions
{
/// <summary>
/// Finds the last occurrence of a delimiter in a given sequence.
/// </summary>
/// <param name="reader">The reader to read from.</param>
/// <param name="delimiter">The delimeter to look for.</param>
/// <param name="rewind">If true, rewinds the reader to its position prior to this method being called.</param>
/// <returns>A SequencePosition if a delimiter is found, otherwise null.</returns>
public static SequencePosition? LastPositionOf(
this ref SequenceReader<byte> reader,
byte[] delimiter,
bool rewind)
{
if (delimiter == null)
{
throw new ArgumentNullException(nameof(delimiter));
}
if (!delimiter.Any())
{
throw new ArgumentException($"{nameof(delimiter)} is empty", nameof(delimiter));
}
var delimiterToFind = new ReadOnlySpan<byte>(delimiter);
var consumed = reader.Consumed;
var delimiterFound = false;
// Keep reading until we've consumed all delimiters
while (reader.TryReadTo(out _, delimiterToFind, true))
{
delimiterFound = true;
}
if (!delimiterFound)
{
if (rewind)
{
reader.Rewind(reader.Consumed - consumed);
}
return null;
}
// If we got this far, we've consumed bytes up to,
// and including, the last byte of the delimiter,
// so we can use that to get the starting byte
// of the delimiter
var result = reader.Sequence.GetPosition(reader.Consumed - delimiter.Length);
if (rewind)
{
reader.Rewind(reader.Consumed - consumed);
}
return result;
}
}
上述测试用例继续通过,但我们现在可以重用相同的reader
。此外,它还允许您指定是否要在被调用之前回退到reader
的原始位置。