来自我的导师:更喜欢本机方法(直接在集合上实现)而不是IEnumerable的扩展方法,因为:
LINQ 到对象扩展方法 在 IEnumerable 上实现, 这意味着在最坏的情况下 方案(当您搜索的项目时 集合中不存在)您 必须通过所有枚举 元素。如果您有包含或 存在方法直接在 上实现 收藏,它可以利用 内部知识,也许只是做一个 哈希表查找或其他一些快速 操作。
我非常困惑,因为我认为Microsoft应该已经为 IEnumerable Contains/Exists 实现了哈希表。使用List和IEnumerable的快速基准测试显示没有区别:
static void Main(string[] args)
{
Console.Write("input the number of elements: ");
int count = Convert.ToInt32(Console.ReadLine());
Console.Write("input the number of loops: ");
int loop = Convert.ToInt32(Console.ReadLine());
Random r = new Random();
Stopwatch sw = new Stopwatch();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt(count);
sw.Start();
for (int j = 0; j < count; j++)
{
DoContains(list, r.Next());
}
sw.Stop();
}
Console.WriteLine("List<T> native method: Iterated {0} times on {1} elements, elapsed :{2}",loop,count,sw.Elapsed);
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt(count);
sw.Start();
for (int j = 0; j < count; j++)
{
DoContainsEnumerable(list, r.Next());
}
sw.Stop();
}
Console.WriteLine("IEnumerable<T> extension method: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt2(count);
sw.Start();
for (int j = 0; j < count; j++)
{
//make sure that the element is not in the list
DoContains(list, r.Next(20000, 50000));
}
sw.Stop();
}
Console.WriteLine("List<T> native method: element does not exist:Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt2(count);
sw.Start();
for (int j = 0; j < count; j++)
{
//make sure that the element is not in the list
DoContainsEnumerable(list, r.Next(20000, 50000));
}
sw.Stop();
}
Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
Console.ReadKey();
}
static List<int> CreateListOfInt(int count)
{
Random r = new Random(1000);
List<int> numbers = new List<int>(count);
for (int i = 0; i < count; i++)
{
numbers.Add(r.Next());
}
return numbers;
}
static bool DoContains(List<int> list, int number)
{
return list.Contains(number);
}
static bool DoContainsEnumerable(IEnumerable<int> list, int number)
{
return list.Contains(number);
}
//define the scope of randomly created number, to make sure that lookup number will not in the List
static List<int> CreateListOfInt2(int count)
{
Random r = new Random(1000);
List<int> numbers = new List<int>(count);
for (int i = 0; i < count; i++)
{
numbers.Add(r.Next(0,10000));
}
return numbers;
}
}
编辑:我尝试了HashSet实现,这大大提高了性能:
sw.Reset();
for (int i = 0; i < loop; i++)
{
var list = CreateListOfInt2(count);
HashSet<int> hashtable = new HashSet<int>(list);
sw.Start();
for (int j = 0; j < count; j++)
{
//make sure that the element is not in the list
hashtable.Contains(r.Next(20000, 50000));
}
sw.Stop();
}
Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
不过,你对我导师说的话有什么看法?
谁能为我清理出来?我的导师是对的吗?如果他是对的,我的代码有什么问题?
谢谢
List<T>
Contains
调用只是迭代列表,因此它们不会比扩展方法快。如果您使用HashSet<T>
并尝试一系列Contains()
操作,您会发现明显的改进。
编辑:Microsoft没有为IEnumerable<T>
扩展方法使用哈希的原因是他们无法保证实现类使用哈希或类似的东西。他们不得不采用幼稚的方法,因为IEnumerable<T>
接口只能保证枚举实现类。
如果 LINQ 版本在对象上具有更快的本机实现,则改用该更快的实现。
例如,Count
是这样实现的:
if (source is Array)
return source.Length;
if (source is ICollection)
return source.Count;
// else iterate through all the items and count them.
Contains
像这样:
if (source is ICollection)
return source.Contains(item);
// else iterate through the enumerable, and see if item exists
由于HashSet<T>
实现ICollection<T>
因此使用本机包含。
因此,LINQ 已针对标准接口进行了优化。但是,如果自定义类型的本机调用不是默认接口的一部分,则 LINQ 调用可能会变慢。