SQL-如何基于聚合将结果聚合并限制为每个ID一行



几个小时以来,我一直在尝试组合一个SQL查询,但似乎都做不好。考虑以下示例表ProductsProductCategories:

Products
--------
ProductId   ProductName
---------   -----------
1         | Achilles
2         | Hermes
3         | Apollo
4         | Zeus
5         | Poseidon
6         | Eros
ProductCategories
-----------------
ProductId   Category
---------   --------
1         | Wars
1         | Wars|Trojan
1         | Wars|Trojans|Mortals
1         | Toys|Games
2         | Travel
2         | Travel|Trade
2         | Communication|Language|Writing
5         | Oceanware
6         | Love
6         | Love|Candy
6         | Love|Valentines
3         | Sunshine
4         | Lightning

目标是选择产品ID、产品名称和与产品相关联的一个类别,以便每个产品ID/名称在结果中出现一次,并且所选的类别是其中管道字符最多的类别。如果一个产品的2个(或多个)类别与最多管道相关联,则随机选择其中任何一个都可以。

换句话说,查询应该得到以下数据集:

ProductId   ProductName     Category
---------   -----------     --------
1         | Achilles      | Wars|Trojans|Mortals
2         | Hermes        | Communication|Language|Writing
3         | Apollo        | Sunshine
4         | Zeus          | Lightning
5         | Poseidon      | Oceanware
6         | Eros          | Love|Valentines

(注意,为Eros返回的类别也可以是Love|Candy,两者都可以)

到目前为止,我有这个SQL,它显然不起作用,因为它为每个产品/类别组合返回一行,而不仅仅是具有最多管道的类别:

SELECT
ProductId,
ProductName,
Category,
MAX(PipeCount)
FROM
(
SELECT DISTINCT
p.ProductId AS ProductId,
p.ProductName AS ProductName,
c.Category AS Category,
LEN(c.CategoryName) - LEN(REPLACE(c.CategoryName, '|', '')) AS PipeCount
FROM
Products p
INNER JOIN ProductCategories c
ON p.ProductId = c.ProductId
) Subquery
GROUP BY ProductId, ProductName, Category, PipeCount

然而,我似乎再也找不到比这更接近我的问题了。我只为每个产品返回一行,其中PipeCount是该产品任何一行的最大PipeCount。如有任何帮助,我们将不胜感激。请注意,这不是我的实际数据;它比这个复杂得多,但是这个例子应该足够了。我正在研究SQL Server 2012,但希望一个好的答案能与几乎任何版本的SQL兼容。

您可以使用ROW_NUMBER来获得CategoryName:数量最多的ProductId

SQL Fiddle

SELECT
p.*,
pc.CategoryName
FROM Products p
INNER JOIN(
SELECT 
*,
RN = ROW_NUMBER() OVER(PARTITION BY ProductId ORDER BY LEN(CategoryName) - LEN(REPLACE(CategoryName, '|', '')) DESC)
FROM ProductCategories
) pc
ON pc.ProductId = p.ProductId
WHERE RN = 1

此处使用ROW_NUMBER的解决方案

--CTE as data sample for two tables
;
WITH    Products
AS ( SELECT   *
FROM     ( VALUES ( 1, 'Achilles'), ( 2, 'Hermes'),
( 3, 'Apollo'), ( 4, 'Zeus'), ( 5, 'Poseidon'),
( 6, 'Eros') ) AS t ( ProductId, ProductName )
),
ProductCategories
AS ( SELECT   *
FROM     ( VALUES ( 1    , 'Wars'), ( 1  , 'Wars|Trojan'),
( 1 , 'Wars|Trojans|Mortals'), ( 1  , 'Toys|Games'),
( 2 , 'Travel'), ( 2    , 'Travel|Trade'),
( 2 , 'Communication|Language|Writing'),
( 5 , 'Oceanware'), ( 6 , 'Love'),
( 6 , 'Love|Candy'), ( 6    , 'Love|Valentines'),
( 3 , 'Sunshine'), ( 4  , 'Lightning') ) AS T ( ProductId, CategoryName )
)
--Final Query
SELECT  T.ProductId ,
T.ProductName ,
T.CategoryName
FROM    ( SELECT    P.ProductID ,
P.ProductName ,
C.CategoryName ,
LEN(C.CategoryName) - LEN(REPLACE(C.CategoryName, '|', '')) AS Pipes ,
ROW_NUMBER() OVER ( PARTITION BY P.ProductID ORDER BY LEN(C.CategoryName)
- LEN(REPLACE(C.CategoryName, '|',
'')) DESC, LEN(C.CategoryName) DESC ) AS RN
FROM      Products AS P
JOIN ProductCategories AS C ON P.ProductId = C.ProductId
) AS T
WHERE   T.RN = 1

我最终使用各种子查询解决了这个问题。需要注意的一点是,它取决于示例中的ProductCategories表是否具有我没有明确指定的唯一列。在我的真实数据中,这个列已经存在,但面对类似的问题,可以添加这样一个列来使这个解决方案发挥作用。这是SQL:

SELECT
Sub1.ProductId,
Sub3.Category
FROM
(
SELECT
o.ProductId AS ProductId,
MAX(LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category)) AS MaxPipeCount
FROM
Products o
INNER JOIN ProductCategories c
ON o.ProductId = c.ProductId
GROUP BY o.ProductID
) Sub1
INNER JOIN
(
SELECT
o.ProductId AS ProductId,
LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category) AS PipeCount,
MAX(c.UniqueId) AS MaxUniqueId
FROM
Products o
INNER JOIN ProductCategories c
ON o.ProductId = c.ProductId
GROUP BY o.ProductID, LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category)
) Sub2
ON Sub1.MaxPipeCount = Sub2.PipeCount
AND Sub1.ProductId = Sub2.ProductId
INNER JOIN
(
SELECT DISTINCT
o.ProductId,
c.Category,
LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category) AS PipeCount,
c.UniqueId
FROM 
Products o
INNER JOIN ProductCategories c
ON o.ProductId = c.ProductId
) Sub3
ON Sub1.MaxPipeCount = Sub3.PipeCount
AND Sub2.MaxUniqueId = Sub3.UniqueId
AND Sub1.ProductId = Sub3.ProductId