几个小时以来,我一直在尝试组合一个SQL查询,但似乎都做不好。考虑以下示例表Products
和ProductCategories
:
Products
--------
ProductId ProductName
--------- -----------
1 | Achilles
2 | Hermes
3 | Apollo
4 | Zeus
5 | Poseidon
6 | Eros
ProductCategories
-----------------
ProductId Category
--------- --------
1 | Wars
1 | Wars|Trojan
1 | Wars|Trojans|Mortals
1 | Toys|Games
2 | Travel
2 | Travel|Trade
2 | Communication|Language|Writing
5 | Oceanware
6 | Love
6 | Love|Candy
6 | Love|Valentines
3 | Sunshine
4 | Lightning
目标是选择产品ID、产品名称和与产品相关联的一个类别,以便每个产品ID/名称在结果中出现一次,并且所选的类别是其中管道字符最多的类别。如果一个产品的2个(或多个)类别与最多管道相关联,则随机选择其中任何一个都可以。
换句话说,查询应该得到以下数据集:
ProductId ProductName Category
--------- ----------- --------
1 | Achilles | Wars|Trojans|Mortals
2 | Hermes | Communication|Language|Writing
3 | Apollo | Sunshine
4 | Zeus | Lightning
5 | Poseidon | Oceanware
6 | Eros | Love|Valentines
(注意,为Eros返回的类别也可以是Love|Candy,两者都可以)
到目前为止,我有这个SQL,它显然不起作用,因为它为每个产品/类别组合返回一行,而不仅仅是具有最多管道的类别:
SELECT
ProductId,
ProductName,
Category,
MAX(PipeCount)
FROM
(
SELECT DISTINCT
p.ProductId AS ProductId,
p.ProductName AS ProductName,
c.Category AS Category,
LEN(c.CategoryName) - LEN(REPLACE(c.CategoryName, '|', '')) AS PipeCount
FROM
Products p
INNER JOIN ProductCategories c
ON p.ProductId = c.ProductId
) Subquery
GROUP BY ProductId, ProductName, Category, PipeCount
然而,我似乎再也找不到比这更接近我的问题了。我只为每个产品返回一行,其中PipeCount是该产品任何一行的最大PipeCount。如有任何帮助,我们将不胜感激。请注意,这不是我的实际数据;它比这个复杂得多,但是这个例子应该足够了。我正在研究SQL Server 2012,但希望一个好的答案能与几乎任何版本的SQL兼容。
您可以使用ROW_NUMBER
来获得CategoryName
:数量最多的ProductId
SQL Fiddle
SELECT
p.*,
pc.CategoryName
FROM Products p
INNER JOIN(
SELECT
*,
RN = ROW_NUMBER() OVER(PARTITION BY ProductId ORDER BY LEN(CategoryName) - LEN(REPLACE(CategoryName, '|', '')) DESC)
FROM ProductCategories
) pc
ON pc.ProductId = p.ProductId
WHERE RN = 1
此处使用ROW_NUMBER
的解决方案
--CTE as data sample for two tables
;
WITH Products
AS ( SELECT *
FROM ( VALUES ( 1, 'Achilles'), ( 2, 'Hermes'),
( 3, 'Apollo'), ( 4, 'Zeus'), ( 5, 'Poseidon'),
( 6, 'Eros') ) AS t ( ProductId, ProductName )
),
ProductCategories
AS ( SELECT *
FROM ( VALUES ( 1 , 'Wars'), ( 1 , 'Wars|Trojan'),
( 1 , 'Wars|Trojans|Mortals'), ( 1 , 'Toys|Games'),
( 2 , 'Travel'), ( 2 , 'Travel|Trade'),
( 2 , 'Communication|Language|Writing'),
( 5 , 'Oceanware'), ( 6 , 'Love'),
( 6 , 'Love|Candy'), ( 6 , 'Love|Valentines'),
( 3 , 'Sunshine'), ( 4 , 'Lightning') ) AS T ( ProductId, CategoryName )
)
--Final Query
SELECT T.ProductId ,
T.ProductName ,
T.CategoryName
FROM ( SELECT P.ProductID ,
P.ProductName ,
C.CategoryName ,
LEN(C.CategoryName) - LEN(REPLACE(C.CategoryName, '|', '')) AS Pipes ,
ROW_NUMBER() OVER ( PARTITION BY P.ProductID ORDER BY LEN(C.CategoryName)
- LEN(REPLACE(C.CategoryName, '|',
'')) DESC, LEN(C.CategoryName) DESC ) AS RN
FROM Products AS P
JOIN ProductCategories AS C ON P.ProductId = C.ProductId
) AS T
WHERE T.RN = 1
我最终使用各种子查询解决了这个问题。需要注意的一点是,它取决于示例中的ProductCategories
表是否具有我没有明确指定的唯一列。在我的真实数据中,这个列已经存在,但面对类似的问题,可以添加这样一个列来使这个解决方案发挥作用。这是SQL:
SELECT
Sub1.ProductId,
Sub3.Category
FROM
(
SELECT
o.ProductId AS ProductId,
MAX(LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category)) AS MaxPipeCount
FROM
Products o
INNER JOIN ProductCategories c
ON o.ProductId = c.ProductId
GROUP BY o.ProductID
) Sub1
INNER JOIN
(
SELECT
o.ProductId AS ProductId,
LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category) AS PipeCount,
MAX(c.UniqueId) AS MaxUniqueId
FROM
Products o
INNER JOIN ProductCategories c
ON o.ProductId = c.ProductId
GROUP BY o.ProductID, LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category)
) Sub2
ON Sub1.MaxPipeCount = Sub2.PipeCount
AND Sub1.ProductId = Sub2.ProductId
INNER JOIN
(
SELECT DISTINCT
o.ProductId,
c.Category,
LEN(REPLACE(c.Category, '|', '||')) - LEN(c.Category) AS PipeCount,
c.UniqueId
FROM
Products o
INNER JOIN ProductCategories c
ON o.ProductId = c.ProductId
) Sub3
ON Sub1.MaxPipeCount = Sub3.PipeCount
AND Sub2.MaxUniqueId = Sub3.UniqueId
AND Sub1.ProductId = Sub3.ProductId