SQL Server 查询 SSIS 转换超时,因为 174 个 UNION ALL 语句



我在Hive和SQL Server中有一个表,数据存储如下。我正在使用SSIS将此数据移动到SQL Server。查询时间过长。"说明"列中大约有 175 个单独的值,这会导致 174 个 UNION ALL 语句,因此查询在大约 2 小时后超时。

SQL 错误 [08S01]: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out*

有没有更好的方法来编写此查询?

谢谢!

蜂房:

ID  | Description
----+------------------------------
1  | Desc1;Desc2;Desc3;Desc4
2  | Desc1;Desc3;Desc4;Desc5;Desc6
...
230 | Desc8;Desc163;Desc9;Desc2;Desc172

SQL Server:

CaseID | GroupID | Description
-------+---------+--------------
1   |    63   | Desc1
1   |    44   | Desc2
1   |    57   | Desc3
1   |    78   | Desc4
...
2   |    78   | Desc1
2   |    57   | Desc3

查询:

select 
case 
when cas.description like '%Desc1%' then 63 
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid 
current_timestamp as INSERT_DT
from 
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all 
select 
case 
when cas.description like '%Desc2%' then 44
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid 
current_timestamp as INSERT_DT
from 
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select 
case 
when cas.description like '%Desc3%' then 57 
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid 
current_timestamp as INSERT_DT
from 
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
union all
select 
case 
when cas.description like '%Desc4%' then 78 
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid 
current_timestamp as INSERT_DT
from 
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'
...
select 
case 
when cas.description like '%Desc175%' then 12 
end as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid 
current_timestamp as INSERT_DT
from 
svc_case cas
inner join account acc on acc.id = cas.id
where cas.description <> 'NULL' and LENGTH(cas.description) > 0
and acc.recordid = '03443FGT'

这是在黑暗中刺痛,但您可以采取 2 件事来改进此查询。首先,让我们解决所有这些UNION ALL如果我正确理解您的查询,您可以取消透视您的数据以实现相同的操作:

SELECT V.groupid,
cas.id AS caseid,
current_timestamp as INSERT_DT
FROM dbo.svc_case cas
JOIN dbo.account acc on acc.id = cas.id
CROSS APPLY (VALUES(CASE WHEN cas.description LIKE '%Desc1%' THEN 63 END),
(CASE WHEN cas.description LIKE '%Desc2%' THEN 44 END),
(CASE WHEN cas.description LIKE '%Desc3%' THEN 57 END),
(CASE WHEN cas.description LIKE '%Desc4%' THEN 78 END),
--I assume there are 174 more of these
(CASE WHEN cas.description LIKE '%Desc178%' THEN 1 END))V(groupid) --The last one isn't correct, but to show how the `APPLY` ends

然后你有你的WHERE,由于LENGTH,这是不可SARG的。LENGTH实际上不是 T-SQL 运算符,所以我希望您实际上使用的是 SQL Server(如果您不是,这是浪费答案,因为上面是特定于 T-SQL 的(。考虑到LEN(NULL)返回NULL,则使用<> ''。考虑到您已经有<> 'NULL'尽管您可以使用NOT IN

WHERE cas.description NOT IN('NULL','')
AND acc.recordid = '03443FGT'

但是,我确实建议不要将文字字符串值'NULL'存储在您的列中,您应该修复它并实际存储NULL,而不是'NULL';2 是不同的值,行为非常不同。

只运行一次查询。 所以没有工会,省略了案例。使用多播并在 SSIS 中拆分它。

您可以扩展代码并使用case转换为数字:

select (case when code = 'Desc1' then 63
when code = 'Desc2' then 44
. . .
end) as groupid, -- maps to groupid
cas.id as caseid, -- maps to caseid 
current_timestamp as INSERT_DT
from svc_case cas join
account acc
on acc.id = cas.id lateral view
explode(split(cas.description, ';')) codes as code
where acc.recordid = '03443FGT';

我不知道你为什么有description <> 'NULL'. 我猜你真的想要is not null- 这对于横向连接是不必要的。

此外,如果您有一个引用表,每个代码和groupid一行,则可以通过连接到该表来进一步简化代码。

最新更新