我想采用流式IEnumerable值,例如:实际的应用程序将流式处理来自DataReader的数据记录
var tuples = new(int, int)[]
{
(0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (2, 0), (2, 1), (2, 2),
};
我想保持状态并观察左字段中的每个变化。每一组具有共同左侧字段的项目应该作为一系列IEnumerables返回(数据将被预先排序,因此我不必担心在本地端进行排序)。
(0,0) (0,1) (0,2) (0,3)
(1,0) (1,1)
(2,0) (2,1) (2,2)
这可能是不可能的,但希望在不创建任何将在RAM中存储记录的每组临时列表的情况下做到这一点,因为每个组将相当大。换句话说,以某种方式让每个组以某种方式神奇地从原始IEnumerable中抽取
一个inner-TakeWhile似乎是可行的方法,但它总是从头开始在tg
上重新开始迭代。
private int currentGroup;
public IEnumerator<IEnumerable<Tuple<int, int>>> GetEnumerator()
{
var tg = TupleGenerator();
foreach (Tuple<int, int> item in tg)
{
currentGroup = item.Item1;
yield return tg.TakeWhile((x) => x.Item1 == currentGroup);
}
}
static IEnumerable<Tuple<int, int>> TupleGenerator()
{
for (int i = 0; i < 10; i++)
{
for (int j = 0; j < 10; j++)
{
yield return new Tuple<int, int>(i,j);
}
}
}
因此,虽然可以避免在内存中存储整个组的数据,并且只在请求时使用固定的内存对每个项进行流处理,但也有缺点。了解为什么缓冲是这类问题的典型解决方案是很重要的。首先,避免缓冲的代码更加复杂。其次,消费者总是在请求下一组之前迭代每个内部IEnumerable
直到完成,并且任何内部IEnumerable
都不能开始迭代超过一次。如果你违反了这些规则中的任何一条(如果你没有明确地检查它们),事情就会悄无声息地出错。(你最终会得到应该在多个组中属于同一组的数据,有错误的组数,等等)考虑到这些错误是多么容易犯,以及搞砸的后果,确实值得显式检查它们并抛出异常,这样消费者至少知道它是错误的,需要修复。
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source,
Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
bool previousGroupFinished = true;
bool sourceExhaused = !iterator.MoveNext();
while (!sourceExhaused)
{
if (!previousGroupFinished)
throw new InvalidOperationException("It is not valid to request the next group until the previous group has run to completion");
previousGroupFinished = false;
bool startedIteratingCurrentGroup = false;
yield return NextGroup();
IEnumerable<T> NextGroup()
{
if (startedIteratingCurrentGroup)
throw new InvalidOperationException("This sequence doesn't support being iterated multiple times.");
startedIteratingCurrentGroup = true;
T previous;
do
{
yield return iterator.Current;
previous = iterator.Current;
sourceExhaused = !iterator.MoveNext();
}
while (!sourceExhaused && predicate(previous, iterator.Current));
previousGroupFinished = true;
}
}
}
}
在你的例子中使用它,它很简单,当第一个项目相等时,你的项目被分组,但你可以使用任何你想要的分组条件。
var data = new[] { (0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (2, 0), (2, 1), (2, 2) };
var grouped = data.GroupWhile((previous, current) => previous.Item1 == current.Item1);
foreach (var group in grouped)
{
Console.WriteLine(String.Join(", ", group));
}
与其在谓词上分组,还不如在某个键对象上分组更方便。在您的示例中,键只是元组中的第一项。但是,如果计算键的成本很高,或者不能多次计算,则可以更改分组机制,使用键选择器而不是谓词,并存储前一个键而不是前一个项。它产生的代码非常相似,但略有不同:
public static IEnumerable<IEnumerable<TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> keyComparer = null)
{
keyComparer = keyComparer ?? EqualityComparer<TKey>.Default;
using (var iterator = source.GetEnumerator())
{
bool previousGroupFinished = true;
bool sourceExhaused = !iterator.MoveNext();
TKey nextKey = keySelector(iterator.Current);
while (!sourceExhaused)
{
if (!previousGroupFinished)
throw new InvalidOperationException("It is not valid to request the next group until the previous group has run to completion");
previousGroupFinished = false;
bool startedIteratingCurrentGroup = false;
yield return NextGroup();
IEnumerable<TSource> NextGroup()
{
if (startedIteratingCurrentGroup)
throw new InvalidOperationException("This sequence doesn't support being iterated multiple times.");
startedIteratingCurrentGroup = true;
TKey previousKey;
do
{
yield return iterator.Current;
sourceExhaused = !iterator.MoveNext();
previousKey = nextKey;
if (!sourceExhaused)
nextKey = keySelector(iterator.Current);
}
while (!sourceExhaused && keyComparer.Equals(previousKey, nextKey));
previousGroupFinished = true;
}
}
}
}
允许你写:
var data = new[] { (0, 0), (0, 1), (0, 2), (0, 3), (1, 0), (1, 1), (2, 0), (2, 1), (2, 2) };
var grouped = data.GroupAdjacent(pair => pair.Item1);
foreach (var group in grouped)
{
Console.WriteLine(String.Join(", ", group));
}
使用MoreLinq的GroupAdjacent
方法:
根据指定的键选择器函数对序列中相邻的元素进行分组。
var groupings = GetValues().GroupAdjacent(tuple => tuple.Item1);
foreach (var grouping in groupings)
{
Console.WriteLine($"Value: {grouping.Key}. Elements: {string.Join(", ", grouping)}");
}
IEnumerable<(int, int)> GetValues()
{
yield return (0, 0);
yield return (0, 1);
yield return (0, 2);
yield return (0, 3);
yield return (1, 0);
yield return (1, 1);
yield return (2, 0);
yield return (2, 1);
yield return (2, 2);
};
输出如下内容:
值:0。元素:(0,0),(0,1),(0,2),(0,3)
值:1。元素:(1,0),(1,1)
值:2。元素:(2,0),(2,1),(2,2)
groupings
可枚举对象上的每个实例都是IGrouping<int, (int, int)>
,当枚举时将得到所需的结果。
(p。S:在这个实现中迭代器只被命中一次,所以它完全是正向的)