c#将值聚合到list属性



假设一个大型二手车经销商(例如CarMax)在不同的州有许多经销商。

public class Inventory
{
public string Make { get; set; }
public string Model { get; set; }
public string Year { get; set; }
public string State { get; set; }
}
List<Inventory> cars = new List<Inventory>();
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "MT" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "AR" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "OH" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "AZ" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "OR" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "MN" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "KY" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2020", State = "FL" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2020", State = "GA" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2010", State = "TN" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2010", State = "WY" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2012", State = "WY" } );

在cars中,我想创建一个对象列表,其中包含Make、Model和Year属性的每种组合,每个对象都有一个States属性,其中包含有与前三个属性匹配的车辆的所有状态列表:

var vehicles[0]: Make = "Ford", Model = "F-150", Year = "2014", States = { "MT", "AR", "OH" }
var vehicles[1]: Make = "Ford", Model = "F-150", Year = "2015", States = { "AZ", "OR", "MN", "KY" }
var vehicles[2]: Make = "Ford", Model = "F-150", Year = "2020", States = { "FL", "GA" }
var vehicles[3]: Make = "Ford", Model = "Ranger", Year = "2010", States = { "TN", "WY" }
var vehicles[4]: Make = "Ford", Model = "Ranger", Year = "2012", States = { "WY" }

我已经研究过使用自连接GroupJoin linq方法方法以及嵌套linq查询,但是我在尝试在此过程中创建States子集合时遇到了障碍。

一个问题:我正在使用的数据集包含超过1亿行(在本例中为Make/Model/Year/State组合),具有6个分组属性,而不是3个。因此,如果可能的话,我尽量避免涉及遍历数据的任何多步骤过程。

考虑一个Dictionary,其中键是Make, Model, Year的组合,值是一组字符串(状态)

var result = new Dictionary<(string make, string model, string year), HashSet<string>>();

这里重要的是,字典不允许每个make/model/year有超过1个条目,并且哈希集不允许你有超过1个状态。一旦有了正确的集合类型,填充代码基本上会自己写

foreach (var item in items)
{
var key = (item.Make, item.Model, item.Year);
if (!dictionary.TryGetValue(key, out var states))
{
states = new HashSet<string>();
dictionary.Add(key, states);
}
states.Add(item.State);
}

我认为我们可以使用lambdaGroupBystring.Join来收集迭代器,这可能不需要使用GroupJoin

cars.GroupBy(x=> new {x.Make,x.Model,x.Year})
.Select(x=> new {
x.Key.Make,
x.Key.Year,
x.Key.Model,
State = string.Join(",",x.Select(z=>z.State))
});

如果你想让它更快,我们可以尝试使用PLINQ来代替

编辑

如果你的数据集是巨大的数据,我们可以尝试使用PLINQ帮助我们并行运行linq,如下图

cars.AsParallel()
.GroupBy(x=> new {x.Make,x.Model,x.Year})
.Select(x=> new {
x.Key.Make,
x.Key.Year,
x.Key.Model,
State = string.Join(",",x.Select(z=>z.State)).ToList()
});

最新更新