SQL Server:比较连接列的无序集是否相等



我有两个表,其中一个是ID列(将两个表匹配在一起),另一个是连接列,其中连接的值以任意顺序排列。我想比较一下这两列是否包含完全相同的项目(以任何顺序),如果不包含,则输出ID。

的例子:表1

PersonID    Products
1           Apple|Pear|Orange
2           Flour|Apple|Butter
3           Apple
4           Banana|Cashews
5           Juice|Crackers|Banana|Cashews
6           Cashews
表2

PersonID    Products
1           Orange|Apple|Pear
2           Flour|Apple|Butter
3           Apple|Banana
4           Banana
5           Crackers|Juice|Banana|Cashews
6           Pear|Crackers

我想获得表1和表2之间产品不是相同集合(任何订单)的所有personid。在这种情况下就是:第三个人(多余的产品),第四个人(缺少的产品),第六个人(不同的产品)。

我当前的查询错误地选取了person 1和person 5,因为他们订购的产品不同。

我当前的查询是这样的:

select t1.personid, t1.products as t1products, t2.products as t2products
from table1 t1 (nolock)
inner join table2 t2 (nolock) on t1.personid = t2.personid
where t1.products != t2.products

我也有预连接形式的数据,每个人有多行(每个产品一个,再分别在两个表中),如果这更有帮助的话——我还没有弄清楚如何按字母顺序连接它们,所以解决这个问题的解决方案也会解决这个问题。

编辑(澄清):未连接的数据如下所示:

表1

PersonID    Product
1           Apple
1           Pear
1           Orange
2           Flour
2           Apple
2           Butter
3           Apple

等。表2

PersonID    Product
1           Orange
1           Apple
1           Pear
2           Flour
2           Apple
2           Butter
3           Apple
3           Banana

等。

我使用STUFF按PersonID将它们连接起来。

如果每一行都有一个product的形式,那么就可以查询另一个表中product和personid不匹配的所有结果。然后对另一个表执行相同的操作,并将结果合并:

SELECT t1.personid, t1.product, '2' AS [Not Found In Table]
FROM table1 t1 
LEFT JOIN table2 t2 ON t1.personid = t2.personid AND t1.product = t2.product
WHERE t2.product IS NULL
UNION
SELECT t2.personid, t2.product,  '1' AS [Not Found In Table]
FROM table2 t2 
LEFT JOIN table1 t1 ON t2.personid = t1.personid AND t2.product = t1.product
WHERE t1.product IS NULL

您可以将其封装在select和CONCAT结果中,以便为每个不匹配的人提供一个从哪个表中缺少的内容的漂亮列表。

测试数据

Declare @t1 TABLE (PersonID INT, Products Varchar(200))
INSERT INTO @t1 VALUES
(1   ,'Apple|Pear|Orange'),
(2   ,'Flour|Apple|Butter'),
(3   ,'Apple'),
(4   ,'Banana|Cashews'),
(5   ,'Juice|Crackers|Banana|Cashews'),
(6   ,'Cashews');
Declare @t2 TABLE (PersonID INT, Products Varchar(200))
INSERT INTO @t2 VALUES
(1   ,'Orange|Apple|Pear'),
(2   ,'Flour|Apple|Butter'),
(3   ,'Apple|Banana'),
(4   ,'Banana'),
(5   ,'Crackers|Juice|Banana|Cashews'),
(6   ,'Pear|Crackers');
查询

WITH Table1 AS (
SELECT  PersonID
        ,Split.a.value('.', 'VARCHAR(100)') Products
FROM   
    (SELECT PersonID
            ,Cast ('<X>' + Replace(Products, '|', '</X><X>') + '</X>' AS XML) AS Data
    FROM    @t1
    ) AS t CROSS APPLY Data.nodes ('/X') AS Split(a) 
), Table2 AS (
SELECT  PersonID
        ,Split.a.value('.', 'VARCHAR(100)') Products
FROM   
    (SELECT PersonID
            ,Cast ('<X>' + Replace(Products, '|', '</X><X>') + '</X>' AS XML) AS Data
    FROM    @t2
    ) AS t CROSS APPLY Data.nodes ('/X') AS Split(a) 
)
SELECT t1.PersonID 
FROM Table1 t1
WHERE NOT EXISTS (SELECT 1 
                  FROM Table2 t2
                  WHERE t1.PersonID = t2.PersonID 
                  AND t1.Products = t2.Products)
UNION  
SELECT t2.PersonID 
FROM Table2 t2
WHERE NOT EXISTS (SELECT 1 
                  FROM Table1 t1
                  WHERE t1.PersonID = t2.PersonID 
                  AND t1.Products = t2.Products)

我只是对完全连接的行进行配对。这样,如果出现pair,则表示产品匹配,如果没有,则表示有问题。所以我希望这个简单的查询也可以解决你的问题:

SELECT DISTINCT PersonID FROM (
    SELECT * FROM table1 
    UNION ALL 
    SELECT * FROM table2
) d 
GROUP BY PersonID, Products 
HAVING COUNT(*) != 2

最新更新