用LINQ做枢轴

  • 本文关键字:LINQ c# linq
  • 更新时间 :
  • 英文 :


我遇到了这个问题,我有一个CSV文件,格式如下(客户,购买的项目对):

customer1 item1
customer1 item2
customer1 item3
customer2 item4
customer2 item2
customer3 item5
customer3 item1
customer3 item2
customer4 item1
customer4 item2
customer5 item5
customer5 item1
现在,我希望在查询结果中显示:
item x; item y; how many customers have bought itemx and item together
例如:

item1 item2 3 (because cust1 and cust2 and cust3 bought item1 and item2 together)
item1 item5 1 (because cust5 and cust3 bought item1 and item5 together)

查询返回顾客成对购买的商品的所有可能组合。还要注意,Pair(x, y)与Pair(y, x)相同。

一个SQL查询应该像这样:

SELECT a1.item_id, a2.item_id, COUNT(a1.cust_id) AS how_many_custs_bought_both
  FROM data AS a1
INNER JOIN data AS a2
  ON a2.cust_id=a1.cust_id AND a2.item_id<>a1.item_id AND a1.item_id<a2.item_id
GROUP BY a1.item_id, a2.item_id

你如何在c#中做到这一点1)使用常规的for/foreach循环2)使用LINQ ?

我试着在LINQ做它,但卡住了,当我注意到LINQ不支持多个等于关键字在连接子句。然后我尝试使用普通循环,然而,它变得如此低效,每秒只能处理大约30行(CSV文件行)。

请建议!

使用LINQ(并遵循Tim回答的前5行)将链接方法语法与连接部分的查询语法相结合:

var custItems =  new [] {
    new { customer = 1, item = 1 },
    new { customer = 1, item = 2 },
    new { customer = 1, item = 3 },
    new { customer = 2, item = 4 },
    new { customer = 2, item = 2 },
    new { customer = 3, item = 5 },
    new { customer = 3, item = 1 },
    new { customer = 3, item = 2 },
    new { customer = 4, item = 1 },
    new { customer = 4, item = 2 },
    new { customer = 5, item = 5 },
    new { customer = 5, item = 1 }
};
};
var pairs = custItems.GroupBy(x => x.customer)
         .Where(g => g.Count() > 1)
         .Select(x =>  (from a in x.Select( y => y.item )
                        from b in x.Select( y => y.item )
                        where a < b //If you want to avoid duplicate (a,b)+(b,a)
                        // or just: where a != b, if you want to keep the dupes.
                        select new { a, b}))
         .SelectMany(x => x)
         .GroupBy(x => x)
         .Select(g => new { Pair = g.Key, Count = g.Count() })
         .ToList();
pairs.ForEach(x => Console.WriteLine(x));

EDIT:忘记OP想要的pair出现计数,添加了另一个。groupby()魔法。

EDIT:完成示例以显示输出内容:

{ Pair = { a = 1, b = 2 }, Count = 3 }
{ Pair = { a = 1, b = 3 }, Count = 1 }
{ Pair = { a = 2, b = 3 }, Count = 1 }
{ Pair = { a = 2, b = 4 }, Count = 1 }
{ Pair = { a = 1, b = 5 }, Count = 2 }
{ Pair = { a = 2, b = 5 }, Count = 1 }

EDIT:回滚并将字符串更改为整数,因为OP显示的数据集以整数作为id,这消除了.GetHashCode()

的需要

可能:

var lines = File.ReadLines(csvFilePath);
var custItems = lines
    .Select(l => new { split = l.Split() })
    .Select(x => new { customer = x.split[0].Trim(), item = x.split[1].Trim() })
    .ToList();
var groups = from ci1 in custItems
             join ci2 in custItems
             on ci1.customer equals ci2.customer
             where ci1.item != ci2.item 
             group new { Item1 = ci1.item, Item2 = ci2.item } by new { Item1 = ci1.item, Item2 = ci2.item } into ItemGroup
             select ItemGroup;
var result = groups.Select(g => new
{
    g.Key.Item1,
    g.Key.Item2,
    how_many_custs_bought_both = g.Count()
});

请注意,当文件由于自连接而很大时,ToList的物化非常重要。

{ Item1 = item1, Item2 = item2, how_many_custs_bought_both = 3 }
{ Item1 = item1, Item2 = item3, how_many_custs_bought_both = 1 }
{ Item1 = item2, Item2 = item1, how_many_custs_bought_both = 3 }
{ Item1 = item2, Item2 = item3, how_many_custs_bought_both = 1 }
{ Item1 = item3, Item2 = item1, how_many_custs_bought_both = 1 }
{ Item1 = item3, Item2 = item2, how_many_custs_bought_both = 1 }
{ Item1 = item4, Item2 = item2, how_many_custs_bought_both = 1 }
{ Item1 = item2, Item2 = item4, how_many_custs_bought_both = 1 }
{ Item1 = item5, Item2 = item1, how_many_custs_bought_both = 2 }
{ Item1 = item5, Item2 = item2, how_many_custs_bought_both = 1 }
{ Item1 = item1, Item2 = item5, how_many_custs_bought_both = 2 }
{ Item1 = item2, Item2 = item5, how_many_custs_bought_both = 1 }

你可以这样写:

IDictionary<int, int> pivotResult = customerItems.ToLookup(c => c.Customer)
                                 .ToDictionary(x=>x.Key, y=>y.Count());

工作LINQ的例子,不太漂亮!

using System;
using System.Collections.Generic;
using System.Linq;
class Data
{
    public Data(int cust, int item)
    {
        item_id = item;
        cust_id = cust;
    }
    public int item_id { get; set; }
    public int cust_id { get; set; }
    static void Main(string[] args)
    {
        var data = new List<Data>
                       {new Data(1,1),new Data(1,2),new Data(1,3),
                        new Data(2,4),new Data(2,2),new Data(3,5),
                        new Data(3,1),new Data(3,2),new Data(4,1),
                        new Data(4,2),new Data(5,5),new Data(5,1)};
           (from a1 in data
            from a2 in data
            where a2.cust_id == a1.cust_id && a2.item_id != a1.item_id && a1.item_id < a2.item_id
            group new {a1, a2} by new {item1 = a1.item_id, item2 = a2.item_id}
            into g
            select new {g.Key.item1, g.Key.item2, count = g.Count()})
            .ToList()
            .ForEach(x=>Console.WriteLine("{0} {1} {2}",x.item1,x.item2,x.count))
            ;
           Console.Read();
    }
}
输出:

1 2 3
1 3 1
2 3 1
2 4 1
1 5 2
2 5 1

相关内容

  • 没有找到相关文章

最新更新