规范化 sql 字符串组合



我有下表:

CREATE TABLE #Fruits
( 
    Fruits VARCHAR(100)
) 
INSERT INTO #Fruits (Fruits)
VALUES ( 'banana,apple'),
       ('apple,banana'),
       ('kiwi,jackfruit'),
       ('jackfruit, kiwi')

banana,apple
apple,banana
kiwi,jackfruit
jackfruit, kiwi

我想再添加一列,其中我获取每行中用逗号分隔的值并按字母顺序重新排列它们。我试图规范化这些值,因为就我而言,苹果,香蕉和香蕉,苹果是同样的东西。 猕猴桃,菠萝蜜和菠萝蜜,猕猴桃是同样的东西。输出应如下所示:

Fruits                Normalized_Fruits
banana,apple          apple,banana
apple,banana          apple,banana
kiwi,jackfruit        jackfruit, kiwi
jackfruit, kiwi       jackfruit, kiwi

我怎样才能达到预期的结果?

我对string_split最大的抱怨之一是它缺少每个值的序号位置。这使得像这样的情况更容易处理。这是另一种方法。我正在使用杰夫·莫登的拆分器,可以在这里找到。这里真的不需要光标。

我还冒昧地添加了一个 GroupID 列,这样一旦解析出每个值,您就知道每个值属于哪一行。如果"水果"列是唯一的,则可以跳过它,但很难确定。

CREATE TABLE #Fruits
( 
    GroupID int identity
    , Fruits VARCHAR(100)
) 
INSERT INTO #Fruits (Fruits)
VALUES ( 'banana,apple'),
       ('apple,banana'),
       ('kiwi,jackfruit'),
       ('jackfruit, kiwi')
;
with SortedResults as
(
    select f.GroupID
        , Item = ltrim(x.Item)
        , x.ItemNumber
        , RowNum = ROW_NUMBER() over(partition by GroupID order by ltrim(x.Item))
    from #Fruits f
    cross apply dbo.DelimitedSplit8K(f.Fruits, ',') x
)
select Max(case when RowNum = 1 then Item end) + ', ' + max(case when RowNum = 2 then Item end)
from SortedResults
group by GroupID
drop table #Fruits

试一试...我可能会被非 Cursor 的人钉住,但这就是我想出的。

    CREATE TABLE #Fruits
( 
    Fruits VARCHAR(100)
) 
INSERT INTO #Fruits (Fruits)
VALUES ( 'banana,apple'),
       ('apple,banana'),
       ('kiwi,jackfruit'),
       ('jackfruit, kiwi')

Declare @tblFruit Table (Fruit1 varchar(100))
Declare @tblFruitSorted Table (FruitSorted varchar(100))
Declare fCursor Cursor For
Select Fruits From #Fruits
Declare @Fruitunsorted varchar(100), @FruitSorted Varchar(100) = ''
Open fCursor
Fetch Next From fCursor Into @Fruitunsorted
While @@FETCH_STATUS = 0
BEGIN
Set @FruitSorted = ''
Insert Into @tblFruit
Select * From string_split(@Fruitunsorted,',')
Update @tblFruit Set Fruit1 = Ltrim(Rtrim(Fruit1))
Select @FruitSorted = @FruitSorted + ',' + Ltrim(Rtrim(Fruit1)) From @tblFruit Order by Fruit1
Delete From @tblFruit
Insert Into @tblFruitSorted
Select Substring(@FruitSorted,2, LEN(@FruitSorted)-1)
Fetch Next From fCursor into @Fruitunsorted
END
Close fCursor
Deallocate fCursor
Select * From @tblFruitSorted
Drop Table #Fruits

如果您使用的是 SQL 2017 或更高版本:

SELECT f.Fruits
    ,STRING_AGG(RTRIM(LTRIM(s.[value])),',') WITHIN GROUP (ORDER BY RTRIM(LTRIM(s.[value])))
FROM #Fruits f CROSS APPLY STRING_SPLIT(f.Fruits,',') s
GROUP BY f.Fruits
;

如果您使用的是旧版本的 SQL(如 2008):

IF OBJECT_ID('tempdb..#Fruits') IS NOT NULL DROP TABLE #Fruits;
CREATE TABLE #Fruits(Fruits VARCHAR(100));
INSERT INTO #Fruits (Fruits) VALUES 
    ('banana,apple'),
    ('apple,banana'),
    ('kiwi,jackfruit'),
    ('jackfruit, kiwi')
;
;WITH Split AS (
    SELECT DISTINCT a.Fruits,RTRIM(LTRIM(tbl.col.value ('@Value', 'nvarchar(max)'))) AS [Fruit]
    FROM (SELECT f.Fruits,CONVERT(XML,'<N Value="' + REPLACE(f.Fruits,',','"></N><N Value="') + '"></N>') AS [x] FROM #Fruits f) a
    CROSS APPLY a.x.nodes('//N') AS tbl (col)
)
SELECT r.Fruits,STUFF((SELECT ',' + s.Fruit FROM Split s WHERE s.Fruits = r.Fruits ORDER BY s.Fruit FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)'),1,1,'') AS [NormalizedFruits]
FROM #Fruits r
;
IF OBJECT_ID('tempdb..#Fruits') IS NOT NULL DROP TABLE #Fruits;

由于我们只谈论两个纯字母字符串的逗号分隔列表,为了好玩和简洁,我将再次抛弃 PARSENAME。修剪之所以存在,是因为源数据中空格的使用不一致,并且ELSE可能会更短,但我希望结果保持一致。

SELECT 
    Fruits
    ,CASE 
        WHEN LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),2))) > LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),1)))
        THEN LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),1))) + ', ' + LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),2)))
        ELSE LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),2))) + ', ' + LTRIM(RTRIM(PARSENAME(REPLACE(Fruits,',','.'),1)))
    END AS Normalized_Fruits
FROM #Fruits

而且 - 只是为了好玩 - 另一个解决方案召唤XQuery救援。

DECLARE @Fruits TABLE(Fruits VARCHAR(100));
INSERT INTO @Fruits (Fruits) VALUES 
    ('banana,apple'),
    ('apple,banana'),
    ('kiwi,jackfruit'),
    ('jackfruit, kiwi');
-

-这是查询

SELECT f.*
      ,CAST('<x>' + REPLACE(REPLACE(f.Fruits,' ',''),',','</x><x>') + '</x>' AS XML)
           .query('
                   for $f in /x/text()
                   order by $f
                   return <y>{concat(",",$f)}</y>
                  ')
           .value('substring(.,2,1000)','nvarchar(max)')
FROM @Fruits f;

通过使用for $f in distinct-values(/x/text())而不是for $f in /x/text(),我们将抑制重复的单词。

总之:
字符串将转换为 XML。这允许.query(),它可以处理XQuery。对于相当普遍的问题,这是非常强大的。单词被排序并以前导逗号返回。需要最后的substring()来切断第一个前导逗号。

最新更新