用于批量提取结果的IEnumerable扩展



我正在使用实体框架,经常遇到需要迭代大量记录的问题。我的问题是,如果我一次把它们都拔出来,我就有可能超时;如果我一次提取一个,那么实际上每个记录都将是一个单独的查询,并且需要花费很长时间。

我想实现一个Linq扩展,它可以批量提取结果,但仍然可以用作IEnumerable。我会给它一组键(很可能是我提取的任何记录的主ID)、一个批大小(简单对象的大小更大,复杂对象的大小更低),以及一个Func,它定义了如何将一组键应用于一组记录类型T。我会这样称呼它:

//get the list of items to pull--in this case, a set of order numbers
List<int> orderNumbers = GetOrderNumbers();
//set the batch size
int batchSize = 100;
//loop through the set using BatchedSelector extension. Note the selection
//function at the end which allows me to 
foreach (var order in dbContext.Orders.BatchedSelector(repairNumbers, batchSize, (o, k) => k.Contains(o.OrderNumber)))
{
    //do things
}

这是我的解决方案草案:

    /// <summary>
    /// A Linq extension that fetches IEnumerable results in batches, aggregating queries
    /// to improve EF performance. Operates transparently to application and acts like any
    /// other IEnumerable.
    /// </summary>
    /// <typeparam name="T">Header record type</typeparam>
    /// <param name="source">Full set of records</param>
    /// <param name="keys">The set of keys that represent specific records to pull</param>
    /// <param name="selector">Function that filters the result set to only those which match the key set</param>
    /// /// <param name="maxBatchSize">Maximum number of records to pull in one query</param>
    /// <returns></returns>
    public static IEnumerable<T> BatchedSelector<T>(this IEnumerable<T> source, IEnumerable<int> keys, Func<T, IEnumerable<int>, bool> selector, int maxBatchSize)
    {
        //the index of the next key (or set of keys) to process--we start at 0 of course
        int currentKeyIndex = 0;             
        //to provide some resiliance, we will allow the batch size to decrease if we encounter errors
        int currentBatchSize = maxBatchSize;
        int batchDecreaseAmount = Math.Max(1, maxBatchSize / 10); //10%, but at least 1
        //other starting variables; a list to hold results and the associated batch of keys
        List<T> resultList = null;
        IEnumerable<int> keyBatch = null;
        //while there are still keys remaining, grab the next set of keys
        while ((keyBatch = keys.Skip(currentKeyIndex).Take(currentBatchSize)).Count() > 0)
        {
            //try to fetch the results
            try
            {
                resultList = source.Where(o => selector(o, keyBatch)).ToList();  // <-- this is where errors occur
                currentKeyIndex += maxBatchSize;  //increment key index to mark these keys as processed
            }
            catch
            {
                //decrease the batch size for our retry
                currentBatchSize -= batchDecreaseAmount;
                //if we've run out of batch overhead, throw the error
                if (currentBatchSize <= 0) throw;
                //otherwise, restart the loop
                continue;
            }
            //since we've successfully gotten the set of keys, yield the results
            foreach (var match in resultList) yield return match;
        }
        //the loop is over; we're done
        yield break;
    }

出于某种原因,"何处"条款无效。我已经验证了keyBatch中是否有正确的密钥,但预期的WHERE OrderNumber IN (k1, k2, k3, kn)行不在那里。就好像我根本没有在哪里声明。

我的最佳猜测是,我需要构建表达式并对其进行编译,但我不确定这是否是问题所在,也不确定如何修复它。我喜欢任何输入。谢谢

WhereSkipTake和所有这类方法都是扩展方法,而不是IEnumerable<T>的成员。对于所有这些方法,实际上有两个版本,一个用于IEnumerable<>,一个适用于IQueryable<>

可枚举扩展

  • Where(Func<TSource, bool> predicate)
  • Select(Func<TSource, TResult> selector)

可查询扩展

  • Where(Expression<Func<TSource, bool>> predicate)
  • Select(Expression<Func<TSource, TResult>> predicate)

正如您所看到的,不同之处在于Queryable扩展采用Expression<>而不是直接委托。这些表达式允许EF将代码转换为SQL

由于您在BatchedSelector()方法中将变量/参数声明为IEnumerable<>,因此您正在使用Enumerable类中的扩展,并且这些扩展是在内存中执行的。

一个常见的错误是,由于多态性,DbSetIQueryable<>)无论您将其用作IEnumerable<>,查询都将被转换为SQL,这仅适用于适当的成员,而不适用于扩展方法。

您的代码可以修复将IEnumerable<>变量/参数更改为IQueryable<>的问题。

您可以在这里阅读更多关于IEnumerableIQueryable之间的差异。

首先,感谢Arturo。你让我走上了这个解决方案的正确轨道。我认为这是一个Linq->实体的问题,但这些问题对我来说还远远不能直观地解决。

其次,我大量借鉴了希米对这个问题的回答。谢谢Shimmy!

首先,我更新了方法以支持整数以外的键类型,因为为什么不呢。因此,方法签名现在是(注意对IQueryable源的更改):

public static IEnumerable<T> BatchedSelector<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> selector, IEnumerable<TKey> keys, int maxBatchSize)

除了产生错误的线路外,该方法基本保持不变,现在用代替

resultList = source.WhereIn(selector, keyBatch).ToList();

WhereIn是一个Linq扩展,主要借用自Shimmy:

    public static IQueryable<T> WhereIn<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> selector, IEnumerable<TKey> keyCollection)
    {
        if (selector == null) throw new ArgumentNullException("Null selector");
        if (keyCollection == null) throw new ArgumentNullException("Null collection");
        //if no items in collection, no results
        if (!keyCollection.Any()) return source.Where(t => false);
        //assemble expression
        var p = selector.Parameters.Single();
        var equals = keyCollection.Select(value => (Expression)Expression.Equal(selector.Body, Expression.Constant(value, typeof(TKey))));
        var body = equals.Aggregate((accumulate, equal) => Expression.Or(accumulate, equal));
        //return expression
        return source.Where(Expression.Lambda<Func<T, bool>>(body, p));
    }

这教会了我一些非常酷的东西:如果你提供一组常量比较的where子句,它将被转换为SQL In语句!整洁的

有了这些变化,该方法可以快速轻松地产生结果。

最新更新