我类似于以下数据集:
ID Text
1 The
1 The Quick
1 The Quick Brown
2 Fox
2 Fox
3 Jumps
3 Over The
3 Over The L
3 Over The Lazy
我需要有代码告诉我最短的"文本"字段中的文本是否包含在相同"ID"的所有其他字段中。
输出应如下所示:
ID Text Result
1 The Yes
1 The Quick Yes
1 The Quick Brown Yes
2 Fox Yes
2 Fox Yes
3 Jumps No
3 Over The No
3 Over The L No
3 Over The Lazy No
实际上,某些文本字段很长,这将跨越数千行数据。
谢谢你的帮助!
我们开始了。首先,确定每个值的长度,并确定长度最短的每个 ID 的值(或者只是使用限定语句标识它(。
select ID, text as shortest_text, length(text) as txt_length from table
qualify row_number() over(partition by ID order by txT_length)=1
OR
select ID, text as shortest_text
from table z
inner join (
select ID, max(txt_length) as max_length from (
select ID, text, length(text) as txt_length from table ) x)y
on z.Id=y.ID and length(z.text)=y.txt_length
然后,您必须基本上将其连接到原始表,并查看该值是否包含在每个ID的不同行中。
select ID, text, shortest_text, case when text like '%'+shortest_text+'%' then 1 else 0 end as contains_shortest_text
from table a
left join ([previous query, if you use Teradata I recommend the first option]) b
on a.ID=b.ID
最后,您将确定每个 ID 的所有行是否包含该值。
select ID, text, min(contains_shortest_text) over(partition by ID) as all_contain_shortest_text
from ([previous query]) x
试试这个:
;with shortestPerID as (
select A.ID as ID, A.Text as Text from
(select ID, Text, MAX(Len(Text)) as Length from TABLE_NAME group by ID, Text) as A
right join
(select ID, min(len(Text)) as Length from TABLE_NAME group by ID) as B
on (A.ID = B.ID and A.Length = B.Length)
)
select A.ID, A.Text,
case when B.Text is null then 'No' else 'Yes' end as Result
from TABLE_NAME as A left join shortestPerID as B
ON (A.ID = B.ID and charindex(B.Text, A.Text) > 0)
首先,CTE
是使用每个 ID 具有最短文本的表。然后简单连接就可以:)
试试这个SQL小提琴。
这是结果,它是不同的:对于 ID=3 和 Text='Jumps',它说是,因为 'Jumps' 包含在自身中:)
ID Text Result
1 The Yes
1 The Quick Yes
1 The Quick Brown Yes
2 Fox Yes
2 Fox Yes
3 Jumps Yes
3 Over The No
3 Over The L No
3 Over The Lazy No
编辑
按照@JohnRC的建议,为了获得最短Text
的'No'
,您可以在select
语句中使用其他JOIN
条件:
on (A.ID = B.ID and A.Length = B.Length and B.Text <> A.Text)
这是我的建议,使用临时表(SQL Server 2014(:
-- Get the minimum text length and number of entries per ID
SELECT ID
, Min(LEN(text)) MINTL
, Count(ID) IDCount
INTO #MINTL
FROM TABLE_NAME
GROUP BY ID;
-- Get all the minimum text(s) and entry count per ID
SELECT DISTINCT X.*
INTO #MINTEXT
FROM (
SELECT A.ID, A.text MinText, B.IDCount
FROM TABLE_NAME A
JOIN #MINTL B
ON A.ID = B.ID AND LEN(A.Text) = B.MinTL
) X;
-- Count the matches for each minimum text per ID
SELECT A.ID
, A.MinTEXT
, A.IDCount
, COUNT(B.ID) MatchCount
INTO #MATCHES
FROM #MINTEXT A
JOIN TABLE_NAME B
ON A.ID = B.ID
AND charindex(A.MinText, B.Text) > 0
GROUP BY A.ID
, A.MinText
, A.IDCount;
-- Produce the result, only say Yes if the number of matches = the entry count
SELECT A.ID
, A.Text
, CASE
WHEN B.IDCOUNT > B.MatchCount THEN 'No'
ELSE 'Yes' END Result
FROM TABLE_NAME A
JOIN #MATCHES B
ON A.ID = B.ID
ORDER BY A.ID, A.Text;
DROP TABLE #MINTL;
DROP TABLE #MINTEXT;
DROP TABLE #MATCHES;
使用与问题中相同的输入生成此结果
ID Text Result
----------- -------------------------------------------------- ------
1 The Yes
1 The Quick Yes
1 The Quick Brown Yes
2 Fox Yes
2 Fox Yes
3 Jumps No
3 Over The No
3 Over The L No
3 Over The Lazy No