确定最短"文本"字段中的文本是否包含在相同"ID"的其他行中的所有其他"文本"字段中



我类似于以下数据集:

ID   Text
1    The
1    The Quick
1    The Quick Brown
2    Fox
2    Fox
3    Jumps
3    Over The
3    Over The L
3    Over The Lazy
我需要有代码告诉我最短的"文本"字段中的文本

是否包含在相同"ID"的所有其他字段中。

输出应如下所示:

ID   Text              Result
1    The               Yes
1    The Quick         Yes
1    The Quick Brown   Yes
2    Fox               Yes
2    Fox               Yes
3    Jumps             No
3    Over The          No
3    Over The L        No
3    Over The Lazy     No

实际上,某些文本字段很长,这将跨越数千行数据。

谢谢你的帮助!

好的,

我们开始了。首先,确定每个值的长度,并确定长度最短的每个 ID 的值(或者只是使用限定语句标识它(。

select ID, text as shortest_text, length(text) as txt_length from table
qualify row_number() over(partition by ID order by txT_length)=1
OR
select ID, text as shortest_text 
from table z
inner join (
select ID, max(txt_length) as max_length from (
select ID, text, length(text) as txt_length from table ) x)y
on z.Id=y.ID and length(z.text)=y.txt_length

然后,您必须基本上将其连接到原始表,并查看该值是否包含在每个ID的不同行中。

select ID, text, shortest_text, case when text like '%'+shortest_text+'%' then 1 else 0 end as contains_shortest_text
from table a
left join ([previous query, if you use Teradata I recommend the first option]) b
on a.ID=b.ID

最后,您将确定每个 ID 的所有行是否包含该值。

select ID, text, min(contains_shortest_text) over(partition by ID) as all_contain_shortest_text
from ([previous query]) x

试试这个:

;with shortestPerID as (
    select A.ID as ID, A.Text as Text from
        (select ID, Text, MAX(Len(Text)) as Length from TABLE_NAME group by ID, Text) as A
    right join
        (select ID, min(len(Text)) as Length from TABLE_NAME group by ID) as B
    on (A.ID = B.ID and A.Length = B.Length)
)    
select A.ID, A.Text, 
       case when B.Text is null then 'No' else 'Yes' end as Result
from TABLE_NAME as A left join shortestPerID as B
ON (A.ID = B.ID and charindex(B.Text, A.Text) > 0)

首先,CTE是使用每个 ID 具有最短文本的表。然后简单连接就可以:)

试试这个SQL小提琴。

这是结果,它是不同的:对于 ID=3 和 Text='Jumps',它说是,因为 'Jumps' 包含在自身中:)

ID   Text              Result
1    The               Yes
1    The Quick         Yes
1    The Quick Brown   Yes
2    Fox               Yes
2    Fox               Yes
3    Jumps             Yes
3    Over The          No
3    Over The L        No
3    Over The Lazy     No

编辑

按照@JohnRC的建议,为了获得最短Text'No',您可以在select语句中使用其他JOIN条件:

on (A.ID = B.ID and A.Length = B.Length and B.Text <> A.Text)

这是我的建议,使用临时表(SQL Server 2014(:

        -- Get the minimum text length and number of entries per ID 
        SELECT ID
            , Min(LEN(text)) MINTL
            , Count(ID) IDCount 
            INTO #MINTL 
            FROM TABLE_NAME 
            GROUP BY ID;
        -- Get all the minimum text(s) and entry count per ID
        SELECT DISTINCT X.* 
            INTO #MINTEXT 
            FROM (
                SELECT A.ID, A.text MinText, B.IDCount 
                    FROM TABLE_NAME A
                    JOIN #MINTL B
                        ON A.ID = B.ID AND LEN(A.Text) = B.MinTL 
                ) X;
        -- Count the matches for each minimum text per ID
        SELECT A.ID
             , A.MinTEXT
             , A.IDCount
             , COUNT(B.ID) MatchCount
            INTO #MATCHES
            FROM #MINTEXT A
                JOIN TABLE_NAME B
                    ON A.ID = B.ID 
                    AND charindex(A.MinText, B.Text) > 0  
            GROUP BY A.ID
                   , A.MinText
                   , A.IDCount;
        -- Produce the result, only say Yes if the number of matches = the entry count
        SELECT A.ID
             , A.Text
             , CASE 
                   WHEN B.IDCOUNT > B.MatchCount THEN 'No' 
                   ELSE 'Yes' END Result
            FROM TABLE_NAME A
            JOIN #MATCHES B 
                ON A.ID = B.ID
            ORDER BY A.ID, A.Text;
        DROP TABLE #MINTL;
        DROP TABLE #MINTEXT;
        DROP TABLE #MATCHES;

使用与问题中相同的输入生成此结果

ID          Text                                               Result
----------- -------------------------------------------------- ------
1           The                                                Yes
1           The Quick                                          Yes
1           The Quick Brown                                    Yes
2           Fox                                                Yes
2           Fox                                                Yes
3           Jumps                                              No
3           Over The                                           No
3           Over The L                                         No
3           Over The Lazy                                      No

最新更新