SQL Server XML处理:根据ID加入不同的节点



我正在尝试用SQL查询XML。假设我有以下XML。

<xml>
<dataSetData>
<text>ABC</text>
</dataSetData>
<generalData>
<id>123</id>
<text>text data</text>
</generalData>
<generalData>
<id>456</id>
<text>text data 2</text>
</generalData>
<specialData>
<id>123</id>
<text>special data text</text>
</specialData>
<specialData>
<id>456</id>
<text>special data text 2</text>
</specialData>
</xml>

我想写一个SELECT查询,返回2行,如下所示:

DataSetData | GeneralDataID | GeneralDataText | SpecialDataTest
ABC         | 123           | text data       | special data text
ABC         | 456           | text data  2    | special data text 2

我目前的方法如下:

SELECT 
dataset.nodes.value('(dataSetData/text)[1]', 'nvarchar(500)'),
general.nodes.value('(generalData/text)[1]', 'nvarchar(500)'),
special.nodes.value('(specialData/text)[1]', 'nvarchar(500)'),
FROM @MyXML.nodes('xml') AS dataset(nodes)
OUTER APPLY @MyXML.nodes('xml/generalData') AS general(nodes)
OUTER APPLY @MyXML.nodes('xml/specialData') AS special(nodes)
WHERE 
general.nodes.value('(generalData/text/id)[1]', 'nvarchar(500)') = special.nodes.value('(specialData/text/id)[1]', 'nvarchar(500)')

这里我不喜欢的是,我必须使用两次OUTER APPLY,并且必须使用WHERE子句来JOIN正确的元素。

因此,我的问题是:是否可以以一种不必以这种方式使用WHERE子句的方式构造查询,因为我非常确信,如果文件变大,这会对性能产生非常负面的影响。

难道不应该用一些XPATH语句来JOIN正确的节点(即相应的generalDataspecialData节点(吗?

您的XPath表达式完全关闭。

请尝试以下操作。它非常高效。您可以使用大型XML来测试它的性能。

SQL

-- DDL and sample data population, start
DECLARE @xml XML = 
N'<xml>
<dataSetData>
<text>ABC</text>
</dataSetData>
<generalData>
<id>123</id>
<text>text data</text>
</generalData>
<generalData>
<id>456</id>
<text>text data 2</text>
</generalData>
<specialData>
<id>123</id>
<text>special data text</text>
</specialData>
<specialData>
<id>456</id>
<text>special data text 2</text>
</specialData>
</xml>';
-- DDL and sample data population, end
SELECT c.value('(dataSetData/text/text())[1]', 'VARCHAR(20)') AS DataSetData
, g.value('(id/text())[1]', 'INT') AS GeneralDataID 
, g.value('(text/text())[1]', 'VARCHAR(30)') AS GeneralDataText
, sp.value('(id/text())[1]', 'INT') AS SpecialDataID 
, sp.value('(text/text())[1]', 'VARCHAR(30)') AS SpecialDataTest
FROM @xml.nodes('/xml') AS t(c)
OUTER APPLY c.nodes('generalData') AS general(g)
OUTER APPLY c.nodes('specialData') AS special(sp)
WHERE g.value('(id/text())[1]', 'INT') = sp.value('(id/text())[1]', 'INT');

输出

+-------------+---------------+-----------------+---------------+---------------------+
| DataSetData | GeneralDataID | GeneralDataText | SpecialDataID |   SpecialDataTest   |
+-------------+---------------+-----------------+---------------+---------------------+
| ABC         |           123 | text data       |           123 | special data text   |
| ABC         |           456 | text data 2     |           456 | special data text 2 |
+-------------+---------------+-----------------+---------------+---------------------+

我想再提出一个解决方案:

DECLARE @xml XML=
N'<xml>
<dataSetData>
<text>ABC</text>
</dataSetData>
<generalData>
<id>123</id>
<text>text data</text>
</generalData>
<generalData>
<id>456</id>
<text>text data 2</text>
</generalData>
<specialData>
<id>123</id>
<text>special data text</text>
</specialData>
<specialData>
<id>456</id>
<text>special data text 2</text>
</specialData>
</xml>';

--查询

SELECT @xml.value('(/xml/dataSetData/text/text())[1]','varchar(100)')
,B.*
,@xml.value('(/xml/specialData[(id/text())[1] cast as xs:int? = sql:column("B.General_Id")]/text/text())[1]','varchar(100)') AS Special_Text
FROM @xml.nodes('/xml/generalData') A(gd)
CROSS APPLY(SELECT A.gd.value('(id/text())[1]','int') AS General_Id
,A.gd.value('(text/text())[1]','varchar(100)') AS General_Text) B;

简而言之:

  • 我们可以直接从变量中读取<dataSetData>,因为它不重复
  • 我们可以使用.nodes()来获得所有<generalData>条目的派生集合
  • 现在的魔术:我使用APPLY将XML中的值作为常规列获取到结果集中
  • 这个技巧现在允许使用sql:column()来构建XQuery谓词以查找相应的<specialData>

FLWOR的另一种方法

你可以试试这个:

SELECT @xml.query
('
<xml>
{
for $i in distinct-values(/xml/generalData/id/text())
return
<combined dsd="{/xml/dataSetData/text/text()}"
id="{$i}"
gd="{/xml/generalData[id=$i]/text/text()}"
sd="{/xml/specialData[id=$i]/text/text()}"/>
}
</xml>
');

结果

<xml>
<combined dsd="ABC" id="123" gd="text data" sd="special data text" />
<combined dsd="ABC" id="456" gd="text data 2" sd="special data text 2" />
</xml>

简而言之:

  • distinct-values()的帮助下,我们可以获得XML中所有id值的列表
  • 我们可以对此进行迭代并选择相应的值
  • 我们将结果作为重新结构化的XML返回

现在您可以对这个新的XML使用.nodes('/xml/combined')并轻松地检索所有值。

性能测试

我只想添加一个性能测试:

CREATE TABLE dbo.TestXml(TheXml XML);
INSERT INTO dbo.TestXml VALUES
(
(
SELECT 'blah1' AS [dataSetData/text]
,(SELECT o.[object_id] AS [id]
,o.[name]      AS [text] 
FROM sys.objects o
FOR XML PATH('generalData'),TYPE)
,(SELECT o.[object_id] AS [id]
,o.create_date AS [text] 
FROM sys.objects o
FOR XML PATH('specialData'),TYPE)
FOR XML PATH('xml'),TYPE
)
)
,(
(
SELECT 'blah2' AS [dataSetData/text]
,(SELECT o.[object_id] AS [id]
,o.[name]      AS [text] 
FROM sys.objects o
FOR XML PATH('generalData'),TYPE)
,(SELECT o.[object_id] AS [id]
,o.create_date AS [text] 
FROM sys.objects o
FOR XML PATH('specialData'),TYPE)
FOR XML PATH('xml'),TYPE
)
)
,(
(
SELECT 'blah3' AS [dataSetData/text]
,(SELECT o.[object_id] AS [id]
,o.[name]      AS [text] 
FROM sys.objects o
FOR XML PATH('generalData'),TYPE)
,(SELECT o.[object_id] AS [id]
,o.create_date AS [text] 
FROM sys.objects o
FOR XML PATH('specialData'),TYPE)
FOR XML PATH('xml'),TYPE
)
);
GO
--just a dummy call to avoid *first call bias*
SELECT x.query('.') FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml//*') A(x)
GO
DECLARE @t DATETIME2=SYSUTCDATETIME();
--My first approach
SELECT TheXml.value('(/xml/dataSetData/text/text())[1]','varchar(100)') AS DataSetValue
,B.*
,TheXml.value('(/xml/specialData[(id/text())[1] cast as xs:int? = sql:column("B.General_Id")]/text/text())[1]','varchar(100)') AS Special_Text
INTO dbo.testResult1
FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml/generalData') A(gd)
CROSS APPLY(SELECT A.gd.value('(id/text())[1]','int') AS General_Id
,A.gd.value('(text/text())[1]','varchar(100)') AS General_Text) B;
SELECT DATEDIFF(MILLISECOND,@t,SYSUTCDATETIME());
GO


DECLARE @t DATETIME2=SYSUTCDATETIME();
--My second approach
SELECT B.c.value('@dsd','varchar(100)') AS dsd
,B.c.value('@id','int') AS id
,B.c.value('@gd','varchar(100)') AS gd
,B.c.value('@sd','varchar(100)') AS sd
INTO dbo.TestResult2
FROM dbo.TestXml
CROSS APPLY (SELECT TheXml.query
('
<xml>
{
for $i in distinct-values(/xml/generalData/id/text())
return
<combined dsd="{/xml/dataSetData/text/text()}"
id="{$i}"
gd="{/xml/generalData[id=$i]/text/text()}"
sd="{/xml/specialData[id=$i]/text/text()}"/>
}
</xml>
') AS ResultXml) A
CROSS APPLY A.ResultXml.nodes('/xml/combined') B(c) 
SELECT DATEDIFF(MILLISECOND,@t,SYSUTCDATETIME());
GO
DECLARE @t DATETIME2=SYSUTCDATETIME();
--Yitzhak'S approach
SELECT c.value('(dataSetData/text/text())[1]', 'VARCHAR(20)') AS DataSetData
, g.value('(id/text())[1]', 'INT') AS GeneralDataID 
, g.value('(text/text())[1]', 'VARCHAR(30)') AS GeneralDataText
, sp.value('(id/text())[1]', 'INT') AS SpecialDataID 
, sp.value('(text/text())[1]', 'VARCHAR(30)') AS SpecialDataTest
INTO dbo.TestResult3
FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml') AS t(c)
OUTER APPLY c.nodes('generalData') AS general(g)
OUTER APPLY c.nodes('specialData') AS special(sp)
WHERE g.value('(id/text())[1]', 'INT') = sp.value('(id/text())[1]', 'INT');
SELECT DATEDIFF(MILLISECOND,@t,SYSUTCDATETIME());
GO
SELECT * FROM TestResult1;
SELECT * FROM TestResult2;
SELECT * FROM TestResult3;
GO
--careful with real data!
DROP TABLE testResult1
DROP TABLE testResult2
DROP TABLE testResult3
DROP TABLE dbo.TestXml;

结果是清楚地指向XQuery。(有人可能会说太悲伤了!现在:-(。

谓词方法是迄今为止最慢的(4700ms(。FLWOR方法排名第2(1200ms(,获胜者是-tatataaaaa-Yitzhak的方法(400ms,按因子约10!(。

哪种解决方案最适合您,将取决于实际数据(每个XML的元素数、XML数等等(。但遗憾的是,视觉优雅并不是这个选择的唯一参数:-(

很抱歉将此作为另一个答案添加,但我不想添加到另一个回答中。它已经足够大了:-(

伊扎克和我的结合速度更快:

--这是要放入性能比较的附加代码

DECLARE @t DATETIME2=SYSUTCDATETIME();
SELECT TheXml.value('(/xml/dataSetData/text/text())[1]', 'VARCHAR(20)') AS DataSetData
,B.*
, sp.value('(id/text())[1]', 'INT') AS SpecialDataID 
, sp.value('(text/text())[1]', 'VARCHAR(30)') AS SpecialDataTest
INTO dbo.TestResult4
FROM dbo.TestXml
CROSS APPLY TheXml.nodes('/xml/generalData') AS A(g)
CROSS APPLY(SELECT g.value('(id/text())[1]', 'INT') AS GeneralDataID 
, g.value('(text/text())[1]', 'VARCHAR(30)') AS GeneralDataText) B
OUTER APPLY TheXml.nodes('/xml/specialData[id=sql:column("B.GeneralDataID")]') AS special(sp);
SELECT DATEDIFF(MILLISECOND,@t,SYSUTCDATETIME());

简而言之:

  • 我们直接读取<dataSetData>(无重复(
  • 我们使用APPLY .nodes()来获得所有的<generalData>
  • 我们使用APPLY SELECT来获取<generalData>元素的值作为实列
  • 我们使用另一个APPLY .nodes()来获取对应的<specialData>元素

此解决方案的一个优点是:如果每个通用数据元素可能有一个以上的特殊数据条目,那么这也会起作用。

这是我测试中最快的(约300毫秒(。

最新更新