Xpath in Sql Server



我需要一些Xpath方面的帮助。我有一个Xml文档,我每天都需要分解(大约500mb),其中包含公共交通上的事务。

Xml的示例如下所示:

https://docs.google.com/document/d/11Ov6KXo83pu3xEXGovrFK-Hm2cMjZW1gqwTnRTraj58/edit?usp=sharing

真实的文件包含数百个站点和数千个事务。

我有一个工作存储过程来分解这里的文档:https://docs.google.com/document/d/1tegGjaK0x7L3uajeUlD1ex0F-yoeSueujDTarkW_Bak/edit?usp=sharing

问题是,当我从Sql Server Agent在我们的VM上运行它时,需要5个小时。

理想情况下,这将在一个小时内完成(它确实在我的个人机器上运行)。

我想也许我可以做这样的事情,在一个语句中完成所有事情,而不是使用临时表和连接:

SELECT
se.value('@Filename', 'varchar(50)') As [Filename], 
se.value('@FareIdentifier', 'varchar(50)') As FareIdentifier,
se.value('@DeviceType', 'varchar(50)') As DeviceType,
ro.value('@ServiceIdentifier', 'varchar(50)') As ServiceIdentifier,
ro.value('@ServiceNumber', 'varchar(50)') As ServiceNumber,
ro.value('@CarrierNumber', 'varchar(50)') AS CarrierNumber,
jo.value('@VehicleRegistration', 'varchar(50)') As VehicleRegistration,
jo.value('@VehicleNumber', 'varchar(50)') As VehicleNumber,
jo.value('@VehicleDepotNumber', 'varchar(50)') As VehicleDepotNumber,
jo.value('@ServiceDepotNumber', 'varchar(50)') As ServiceDepotNumber,
jo.value('@JourneyDirection', 'varchar(50)') As JourneyDirection,
jo.value('@JourneyEnd', 'varchar(50)') As JourneyEnd,
jo.value('@JourneyStart', 'varchar(50)') As JourneyStart,
jo.value('@RunningBoardNumber', 'varchar(50)') As RunningBoardNumber,
jo.value('@RunningBoardDepotNumber', 'varchar(50)') As RunningBoardDepotNumber,
jo.value('@DutyNumber', 'varchar(50)') As DutyNumber,
jo.value('@DutyDepotNumber', 'varchar(50)') As DutyDepotNumber,
jo.value('@DriverName', 'varchar(50)') As DriverName,
jo.value('@DriverNumber', 'varchar(50)') As DriverNumber,
jo.value('@FinalTicketSerial', 'varchar(50)') As FinalTicketSerial,
jo.value('@StartTicketSerial', 'varchar(50)') As StartTicketSerial,
jo.value('@ContractType', 'varchar(50)') As ContractType,
jo.value('@DriverDepotNumber', 'varchar(50)') As DriverDepotNumber,
jo.value('@EtmDepotNumber', 'varchar(50)') As EtmDepotNumber,
jo.value('@Company', 'varchar(50)') As Company,
jo.value('@JourneyType', 'varchar(50)') As JourneyType,
jo.value('@JourneyNumber', 'varchar(50)') As JourneyNumber,
st.value('@TCA', 'varchar(50)') As TCA,
st.value('@Latitude', 'varchar(50)') As Latitude,
st.value('@Longitude', 'varchar(50)') As Longitude,
st.value('@NaptanCode', 'varchar(50)') As NaptanCode,
st.value('@AtcoCode', 'varchar(50)') As ATCOCode,
st.value('@StopName', 'varchar(50)') As StopName,
st.value('@BusStopNumber', 'varchar(50)') As BusStopNumber,
st.value('@StopNumber', 'varchar(50)') As StopNumber,
st.value('@BoardingFareStageName', 'varchar(50)') As BoardingFareStageName,
st.value('@BoardingFareStageNumber', 'varchar(50)') As BoardingFareStageNumber,
st.value('@BoardingFareStageOwner', 'varchar(50)') As BoardingFareStageOwner,
st.value('@RealDeparture', 'varchar(50)') As RealDeparture,
st.value('@RealArrival', 'varchar(50)') As RealArrival,
st.value('@OwnerNumber', 'varchar(50)') As OwnerNumber,
cs.value('@SoldTicketClassName', 'varchar(50)') As SoldTicketClassName,
cs.value('@SalesPrice', 'varchar(50)') As SalesPrice,
cs.value('@PaymentMethod', 'varchar(50)') As PaymentMethod,
cs.value('@ISOCurrency', 'varchar(50)') As ISOCurrency,
cs.value('@SoldTicketClassNo', 'varchar(50)') As SoldTicketClassNo,
cs.value('@TicketProductName', 'varchar(50)') As TicketProductName,
cs.value('@TicketProductNumber', 'varchar(50)') As TicketProductNumber,
cs.value('@NumTransactions', 'varchar(50)') As NumTransactions,
cs.value('@NumPgersOnTicket', 'varchar(50)') As NumPgersOnTicket,
cs.value('@IssueDateTime', 'varchar(50)') As IssueDateTime,
cs.value('@TicketNumber', 'varchar(50)') As TicketNumber,
cs.value('@TicketGUID', 'varchar(50)') As TicketGUID,
cs.value('@DestinationFareStageName', 'varchar(50)') As DestinationFareStageName,
cs.value('@DestinationFareStageNumber', 'varchar(50)') As DestinationFareStageNumber,
cs.value('@DestinationFareStageOwner', 'varchar(50)') As DestinationFareStageOwner,
cs.value('@EventDateTime', 'varchar(50)') As EventDateTime,
cs.value('@TransactionType', 'varchar(50)') As TransactionType,
cs.value('@CardId', 'varchar(50)') As CardId,
cs.value('@SchemeIdentifier', 'varchar(50)') As SchemeIdentifier
FROM
      XMLUpload CROSS APPLY
      XmlData.nodes('Header/Session')   AS [Session](se)        CROSS APPLY
      se.nodes('Routes/Route')          AS [Route](ro)          CROSS APPLY
ro.nodes('Journey')               As [Journey](jo)        CROSS APPLY
jo.nodes('DrivenStops/Stop')      As [Stop](st)           CROSS APPLY
st.nodes('Events/*')      As [CashSale](cs) 

这不起作用,因为它为元素CashSale和CardUsage创建了重复的行(其中一行的CashSale字段为null,一行的CardUsage字段为null),而它们本应是同一行的一部分。

理想的结果是每个CashSale有一条记录,当它们存在时,CardUsage字段在同一行。如果没有CardUsage,则这些字段为空。

有人能建议我进行一些优化吗?或者给我指一些在线资源。我很难在网上找到有用的东西。

我已经尝试过将数据类型更改为仅需要的大小,这给了我一些性能,但没有我需要的那么多。

我完全对另一个解决方案持开放态度(可能使用powershell或类似的解决方案),但我无法让任何其他解决方案使用此xml。

这样的XML很难查询。你必须依赖隐含的顺序。SQL Server的XQuery并没有完全意识到一些奇特的导航内容,比如兄弟和下一个元素。但你可以做一个技巧:你可以在CTE中分解这一行,并使用一个简单的NodeIndex=NodeIndex+1来找到下一个元素。如果这被称为"CardUsage",你就接受它,否则就不接受。

请:尽量减少你的例子!

DECLARE @dummyTable TABLE(ID INT IDENTITY, XmlData XML);
INSERT INTO @dummyTable VALUES
(N'<Header SchemaVersion="2.1" CreationDateTime="20171114044600" ExportCounter="1" DepositDate="20171114" Producer="http://portal.provider.org.uk/v2">
<Session SessionId="a468dcd3fa963a49b274e64a5c714244" GTCCash="0" GTCRev="0" GTCTrans="0" DepositDateTime="20171112210811" DeviceType="Ticketer ETM" DeviceSerialNumber="0441105106901809" StartDateTime="20171112163000" EndDateTime="20171112210814" FareIdentifier="Ticketer">
<Routes>
<Route CarrierNumber="1" ServiceNumber="13" ServiceIdentifier="13">
<Journey JourneyNumber="639" JourneyType="SRV" Company="1" EtmDepotNumber="188" DriverDepotNumber="188" ContractType="UNK" StartTicketSerial="781" FinalTicketSerial="802" DriverNumber="214373" DutyNumber="214373" RunningBoard="420" JourneyStart="20171112173414" JourneyEnd="20171112190532" JourneyDirection="Outbound" VehicleNumber="67798" VehicleRegistration="67798" DriverName="GARY TAYLOR" VehicleDepotNumber="188" DutyDepotNumber="0" RunningBoardDepotNumber="188" ServiceDepotNumber="0" RunningBoardNumber="3325">
<DrivenStops>
<Stop OwnerNumber="1" RealArrival="20171112190057" RealDeparture="20171112190057" BoardingFareStageOwner="0" BoardingFareStageNumber="701" BoardingFareStageName="Town Square (58)" StopNumber="31" BusStopNumber="1" StopName="Town Square" AtcoCode="639001322" NaptanCode="" Latitude="55.165746" Longitude="-2.141343" TCA="">
<Events>
<CashSale TicketGUID="67177058-d0f8-422e-bb19-9245acea25fa" TicketNumber="802" BoardingFareStageOwner="0" BoardingFareStageNumber="701" BoardingFareStageName="Town Square (58)" DestinationFareStageOwner="0" DestinationFareStageNumber="683" DestinationFareStageName="Golden Lion (60)" IssueDateTime="20171112190057" NumPgersOnTicket="1.00" NumTransactions="1" TicketProductNumber="e8d237e2c67aba4bbc922d8daa46ce75" TicketProductName="SC Over 60" SoldTicketClassNo="18285" ISOCurrency="GBP" PaymentMethod="8" SalesPrice="0" ValuePrice="150" SoldTicketClassName="SC Over 60" />
<CardUsage SchemeIdentifier="CCS" CardId="63356000903500039" TransactionType="USAGE" EventDateTime="20171112190057" />
<CashSale TicketGUID="f7aa4ed4-a385-407d-b116-1c7eb69563be" TicketNumber="811" BoardingFareStageOwner="0" BoardingFareStageNumber="1421" BoardingFareStageName="Town Square (58)" IssueDateTime="20171112193332" NumPgersOnTicket="1.00" NumTransactions="1" TicketProductNumber="e49ac3fa349bfc40b2173163538c121e" TicketProductName="QR Accept Ad Wk" SoldTicketClassNo="19137" ISOCurrency="GBP" PaymentMethod="15" SalesPrice="0" SoldTicketClassName="QR Accept Ad Wk" />
<CardUsage SchemeIdentifier="Barcode" CardId="d66e1bb355df4668b95be2e4764d30a6" TransactionType="USAGE" EventDateTime="20171112193332" />
<CashSale TicketGUID="c6c2a867-676a-4d73-98d6-3a2571892b3d" TicketNumber="812" BoardingFareStageOwner="0" BoardingFareStageNumber="1421" BoardingFareStageName="Town Square (58)" IssueDateTime="20171112193336" NumPgersOnTicket="1.00" NumTransactions="1" TicketProductNumber="ed5f87065daf384bab7b0bd9f4b02c80" TicketProductName="Use Wk/M/3M/Yr" SoldTicketClassNo="18724" ISOCurrency="GBP" PaymentMethod="1" SalesPrice="0" SoldTicketClassName="Use Wk/M/3M/Yr" />
<CashSale TicketGUID="2c7038ea-0800-4e4e-9d20-35fdb129e317" TicketNumber="828" BoardingFareStageOwner="0" BoardingFareStageNumber="272" BoardingFareStageName="Town Square (58)" DestinationFareStageOwner="0" DestinationFareStageNumber="383" DestinationFareStageName="Long Street (33)" IssueDateTime="20171112204134" NumPgersOnTicket="1.00" NumTransactions="1" TicketProductNumber="38e47af2e0a23c4a85eebd96d419feef" TicketProductName="Adult Single" SoldTicketClassNo="18286" ISOCurrency="GBP" PaymentMethod="1" SalesPrice="250" SoldTicketClassName="Adult Single" />
</Events>
</Stop>
</DrivenStops>
</Journey>
</Route>
</Routes>
</Session>
</Header>');

--这是您的简化查询:

SELECT
st.value('@OwnerNumber', 'varchar(50)') As OwnerNumber,
cs.value('@SoldTicketClassName', 'varchar(50)') As SoldTicketClassName,
--please try to reduce your examples!
cs.value('@TicketGUID', 'varchar(50)') As TicketGUID,
cs.value('@CardId', 'varchar(50)') As CardId
FROM
@dummyTable AS XmlUpload CROSS APPLY
XmlData.nodes('Header/Session')   AS [Session](se)        CROSS APPLY
se.nodes('Routes/Route')          AS [Route](ro)          CROSS APPLY
ro.nodes('Journey')               As [Journey](jo)        CROSS APPLY
jo.nodes('DrivenStops/Stop')      As [Stop](st)           CROSS APPLY
st.nodes('Events/*')      As [CashSale](cs);
/*
+-------------+---------------------+--------------------------------------+----------------------------------+
| OwnerNumber | SoldTicketClassName | TicketGUID                           | CardId                           |
+-------------+---------------------+--------------------------------------+----------------------------------+
| 1           | SC Over 60          | 67177058-d0f8-422e-bb19-9245acea25fa | NULL                             |
+-------------+---------------------+--------------------------------------+----------------------------------+
| 1           | NULL                | NULL                                 | 63356000903500039                |
+-------------+---------------------+--------------------------------------+----------------------------------+
| 1           | QR Accept Ad Wk     | f7aa4ed4-a385-407d-b116-1c7eb69563be | NULL                             |
+-------------+---------------------+--------------------------------------+----------------------------------+
| 1           | NULL                | NULL                                 | d66e1bb355df4668b95be2e4764d30a6 |
+-------------+---------------------+--------------------------------------+----------------------------------+
| 1           | Use Wk/M/3M/Yr      | c6c2a867-676a-4d73-98d6-3a2571892b3d | NULL                             |
+-------------+---------------------+--------------------------------------+----------------------------------+
| 1           | Adult Single        | 2c7038ea-0800-4e4e-9d20-35fdb129e317 | NULL                             |
+-------------+---------------------+--------------------------------------+----------------------------------+
*/

--我假设,除了@TicketGUID之外,你还想要@CardId——如果有的话。试试这个:

WITH InnerNodesCTE AS
(
SELECT
st.value('@OwnerNumber', 'varchar(50)') As OwnerNumber,
--a running counter
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS NodeIndex,
--the node's name
cs.value('local-name(.)','nvarchar(100)') AS CsNodeName,
--the node in one piece
cs.query('.') AS InnerNode
FROM
@dummyTable AS XmlUpload CROSS APPLY
XmlData.nodes('Header/Session')   AS [Session](se)        CROSS APPLY
se.nodes('Routes/Route')          AS [Route](ro)          CROSS APPLY
ro.nodes('Journey')               As [Journey](jo)        CROSS APPLY
jo.nodes('DrivenStops/Stop')      As [Stop](st)           CROSS APPLY
st.nodes('Events/*')      As [CashSale](cs) 
)
SELECT OwnerNumber,
InnerNode.value('CashSale[1]/@TicketGUID', 'varchar(50)') As TicketGUID,
--here you can list all your CashSale attributes
CardUsageNode.value('CardUsage[1]/@CardId', 'varchar(50)') As CardId
--here you can list all your CardUsage attributes
FROM InnerNodesCTE AS cs
OUTER APPLY(SELECT cu.InnerNode AS CardUsageNode FROM InnerNodesCTE AS cu WHERE cu.NodeIndex = cs.NodeIndex+1 AND cu.CsNodeName='CardUsage') AS A
WHERE CsNodeName='CashSale'

结果

/*
+---+--------------------------------------+----------------------------------+
| 1 | 67177058-d0f8-422e-bb19-9245acea25fa | 63356000903500039                |
+---+--------------------------------------+----------------------------------+
| 1 | f7aa4ed4-a385-407d-b116-1c7eb69563be | d66e1bb355df4668b95be2e4764d30a6 |
+---+--------------------------------------+----------------------------------+
| 1 | c6c2a867-676a-4d73-98d6-3a2571892b3d | NULL                             |
+---+--------------------------------------+----------------------------------+
| 1 | 2c7038ea-0800-4e4e-9d20-35fdb129e317 | NULL                             |
+---+--------------------------------------+----------------------------------+
*/

相关内容

最新更新