如何从复杂查询中创建一个表,并通过不断实时/更新来拥有它



我在一家服装公司工作,处理他们的订单数据。 订单数据分为 3 个数据库。包含其订单信息的ERP数据,如价格,数量等。 包含该订单的所有发货信息的发货数据。 以及一个 Shopify 数据库,其中包含使用 API 的 Shopify 的所有信息。

在过去的几个月里,我已经针对这三个表中的所有数据创建了一个全面的仪表板。 我一直在针对此数据编写存储过程,但问题是运行需要很长时间,因为我必须跨多个数据库联接多个表。

我想创建一个每天运行一到两次的作业,读取自上次运行以来收到的所有新订单,并将所有这些数据添加到一个数据库上的一个表中。 然后我所有的查询都将使用这个表,使一切运行得更快。

我还希望工作不仅要添加新订单,还要更新旧订单。假设我的表有"状态"字段。首次创建订单时,状态将为"已订购",但 2 天后它可能变为"已发货"。 而其他字段(例如Shipment_Number字段)在该信息更新之前将为 null。

有人可以帮我弄清楚如何做到这一点吗?

这是我的大型查询,它收集了我需要的所有数据:

SELECT DISTINCT
--TOP 10
O.Company_Code
,O.Division_Code
,O.Date_Entered
,O.Customer_Number
,O.Control_Number
,O.Customer_Purchase_Order_Number
,P.PickTicket_Number
,CASE 
WHEN sh.packslip IS NULL 
THEN ph.packslip
ELSE sh.packslip 
END Packslip
,PST.Stage_Code Pickticket_Stage_Code
,I.Invoice_Number
,IST.Stage_Code Inovice_Stage_Code
,CASE  
WHEN sh.packslip IS NULL
THEN z.status 
ELSE 'SHIPPED & UPLOADED' 
END Accellos_Status
,O.Order_Type_Code
--,O.Order_Status
,CASE 
WHEN o.Order_Status = 30 THEN 'CANCELED' 
WHEN o.Order_Status = 45 THEN 'SHIPPED' 
WHEN o.Order_Status = 10 THEN 'NORMAL' 
WHEN o.Order_Status = 20 THEN 'NORMAL' 
END Order_Status
,OD.Odet_Line_Number
--,OD.Line_Status
,CASE 
WHEN od.Line_Status = 90 THEN 'ALLOCATED'
WHEN od.Line_Status = 80 THEN 'CANCELED' 
WHEN od.Line_Status = 70 THEN 'SHIPPED' 
WHEN od.Line_Status = 50 THEN 'NORMAL' 
END Line_Status
,S.Master_Item 
,OD.Item_Number
,S.Description
,OD.Color_Code
,C.Color_Description
,OD.Quantity_Ordered
,OD.Quantity_Allocated
,OD.Quantity_Invoiced
,RT.Retail Unit_Retail
,(RT.Retail-OD.Price) Unit_MarkDown
,OD.Price Sell_Price
,OD.Discount_Value
,(OD.Price - OD.Discount_Value) Sale_Price
,((OD.Price - OD.Discount_Value)* (Royalty_Percentage/100)) Royalty_Cost

,CASE 
WHEN ISNULL(s.Actual_Cost, 0) = 0
THEN ISNULL(s.Standard_Cost, 0)
ELSE s.Actual_Cost  
END Item_Cost
,SR.Royalty_Code
,RC.Royalty_Percentage
,OD.Freight_Charges
--,CASE 
--  WHEN sh.packslip is null 
--      THEN ph.COST_SHIP 
--      ELSE sh.COST_SHIP 
--END Freight_Cost
,SM.Cost_Ship Master_Freight_Cost
,OD.Tax_Value
,CASE WHEN (OD.Quantity_Ordered <> OD.Quantity_Invoiced and OD.Quantity_Allocated=0) THEN OD.Quantity_Ordered - OD.Quantity_Invoiced ELSE 0 END ShortShippedAmount
,RD.Return_Number
,RD.Quantity_Returned
--,OW.Email
,CASE 
WHEN AE.Email IS NULL -- Does it matter which one I do for then and for else? One is in the same database
THEN OW.Email 
ELSE AE.Email 
END Email
,CASE  --Is there a better way to do this? When one table has the information and the other does't
WHEN sh.packslip IS NULL 
THEN ph.ship_name 
ELSE sh.SHIP_NAME 
END Customer_Name
,CASE 
WHEN sh.packslip IS NULL 
THEN ph.SHIP_ADD1 + ' ' + ph.SHIP_ADD2 
ELSE sh.SHIP_ADD1 + ' ' + sh.SHIP_ADD2 
END Customer_Addr 
,CASE 
WHEN sh.packslip IS NULL 
THEN ph.SHIP_CITY 
ELSE sh.SHIP_CITY 
END Customer_City
,CASE 
WHEN sh.packslip is null 
THEN ph.SHIP_PROV 
ELSE sh.SHIP_PROV 
END Customer_State
,CASE 
WHEN sh.packslip is null 
THEN ph.SHIP_ZIP 
ELSE sh.SHIP_ZIP 
END Customer_Zip 
,o.Ship_Via_Code
,CASE 
WHEN sh.packslip is null 
THEN ph.SHIP_VIA 
ELSE sh.SHIP_VIA
END Ship_Via
,CASE 
WHEN sh.packslip is null 
THEN ph.SERVICE 
ELSE sh.SERVICE 
END SERVICE
,CASE 
WHEN sh.packslip is null 
THEN ph.SHIP_NUM 
ELSE sh.SHIP_NUM 
END Ship_Num
,OW.Gateway
,OW.Tags
,OW.FinancialState
,OW.FulfillmentState
,OW.OrderCreated Shopify_Ordered
,OW.DateCreated Pro4Soft_Created
,OW.DownloadTime Pro4Soft_Downloaded
,OW.ConfirmedDownloadTime Pro4Soft_Confirmed
,FW.DateCreated Pro4Soft_Fulfilled
,OW.OrderUpdated Pro4Soft_Updated
,OW.OrderClosed Pro4Soft_Closed
,OW.OrderCancelled Pro4Soft_Cancelled
,OW.LastSynced Pro4Soft_Last_Synced
,OTSS.AMT_Created
,OTSS.AMT_Invoiced
,OTSS.AMT_Cancelled
,PTSS.Pickticket_Created
,PTSS.Pickticket_Allocated
,RLTSS.Accellos_Download
,RLTSS.Accellos_Allocated
,RLTSS.Accellos_Waved
,RLTSS.Accellos_Label
,RLTSS.Accellos_Last_Pick
,RLTSS.Accellos_Rating
,RLTSS.Accellos_Shipped
,RLTSS.Accellos_Upload
,CASE 
WHEN sh.packslip IS NULL
THEN ph.Date_modfy 
ELSE sh.Date_modfy 
END Accellos_Timestamp
,SM.Tracktrace
--,*
--AMT Tables
FROM        [JMNYC-AMTDB].[AMTPLUS].[dbo].Orders O WITH (NOLOCK) 
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Order_Detail OD WITH (NOLOCK)             
on O.Company_Code = OD.Company_Code
and O.Division_Code = OD.Division_Code
and O.Control_Number = OD.Control_Number
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].PickTickets P WITH (NOLOCK)       
on O.Company_Code = P.Company_Code 
and O.Division_Code = P.Division_Code
and O.Control_Number = P.Control_Number
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].PickTickets_Stage PST WITH (NOLOCK) 
ON o.Company_Code=pst.Company_Code 
AND o.Division_Code=pst.Division_Code 
AND p.PickTicket_Number=pst.PickTicket_Number
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices I WITH (NOLOCK)              
on O.Company_Code = I.Company_Code 
and O.Division_Code = I.Division_Code
and O.Control_Number = I.Control_Number
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices_Detail ID WITH (NOLOCK)      
on O.Company_Code = ID.Company_Code 
and O.Division_Code = ID.Division_Code
and O.Control_Number = ID.Control_Number
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Invoices_Stage IST WITH (NOLOCK)      
ON o.Company_Code = ist.Company_Code  
AND o.Division_Code = ist.Division_Code 
AND i.Invoice_Number = ist.Invoice_Number
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Style S WITH (NOLOCK)                     
on O.Company_Code = S.Company_Code
and O.Division_Code = S.Division_Code
and OD.Item_Number = S.Item_Number
and OD.Color_Code = S.Color_Code
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Color C WITH (NOLOCK)                     
on S.Color_Code = C.Color_Code
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Header RH WITH (NOLOCK)   
on O.Company_Code = RH.Company_Code
and O.Division_Code = RH.Division_Code
and I.Invoice_Number = RH.Invoice_Number
and O.Customer_Purchase_Order_Number = RH.Customer_Purchase_Order_Number
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Returns_Detail RD WITH (NOLOCK)   
on O.Company_Code = RD.Company_Code
and O.Division_Code = RD.Division_Code
and O.Control_Number = RD.Control_Number
and RH.Return_Number = RD.Return_Number
and OD.Item_Number = RD.Item_Number
and OD.Color_Code = RD.Color_Code
and OD.Odet_Line_Number = RD.Odet_Line_Number
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Style_Royalty SR WITH (NOLOCK)        
on S.Company_Code = SR.Company_Code
and S.Division_Code = SR.Division_Code
and S.Item_Number = SR.Item_Number
and S.Color_Code = SR.Color_Code
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Royalty_By_Customer RC WITH (NOLOCK)  
on S.Company_Code = RC.Company_Code
and S.Division_Code = RC.Division_Code
and SR.Royalty_Code = RC.Royalty_Code
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].Z_N_FullRetailTable RT WITH (NOLOCK)
on OD.Item_Number = RT.SKU
and OD.Color_Code = RT.Color
LEFT JOIN   [JMNYC-AMTDB].[AMTPLUS].[dbo].ArchivedEmails AE WITH (NOLOCK) 
ON O.Customer_Purchase_Order_Number = AE.OrderNumber
--WMS Tables
LEFT JOIN   [JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].pickhead ph WITH (NOLOCK) --When the order is not shipped and invoiced yet it is in this table.
ON p.PickTicket_Number = ph.packslip
LEFT JOIN   [JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].SHIPHIST sh WITH (NOLOCK) --After order is Shipped it gets deleted from Pickhead and moves to this table.
ON o.Customer_Purchase_Order_Number = sh.cust_po
LEFT JOIN   [JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].SHIPMSTR2 SM WITH (NOLOCK) 
ON o.Customer_Purchase_Order_Number = sm.Cust_Order
and sh.packslip=sm.packslip
LEFT JOIN   [JMDNJ-ACCELSQL].[A1WAREHOUSE].[dbo].Z_Status z WITH (NOLOCK) 
ON ph.PROCSTEP = z.procstep
--Shopify Tables
LEFT JOIN   [SHOPIFY].[shopify_moret].[dbo].[OrderWrappers] OW WITH (NOLOCK)
ON O.Customer_Purchase_Order_Number = OW.OrderName
LEFT JOIN   [SHOPIFY].[shopify_moret].[dbo].[FulfillmentWrappers] FW WITH (NOLOCK)
ON OW.Id =FW.OrderWrapperId
--Timestampt Tables (Mix of AMT and WMS)
LEFT JOIN (
SELECT
Company_Code
,Division_Code
,Control_Number
,max( case when Action like 'New%' then Date end) as AMT_Created
,max( case when Action like '%Invoice%' then Date end) as AMT_Invoiced
,max( case when Action like '%Cancel%' then Date end) as AMT_Cancelled
FROM(
SELECT DISTINCT
Company_Code
,Division_Code
,Control_Number
,Date
,Action 
from [JMNYC-AMTDB].[AMTPLUS].[dbo].Order_Audit_Trail WITH (NOLOCK)
)OTS
group by Company_Code, Division_Code, Control_Number
)OTSS 
on O.Control_Number = OTSS.Control_Number 
and O.Company_Code = OTSS.Company_Code 
and O.Division_Code = OTSS.Division_Code
LEFT JOIN (
SELECT
Company_Code
,Division_Code
,PickTicket_Number
,max( case when Action like 'Created%' then Date end) as   Pickticket_Created
,max( case when Action like 'Inserted%' then Date end) as  Pickticket_Allocated
,max( case when Action like 'Updated%' then Date end) as  Pickticket_Updated
--,max( case when Action like '%Cancel%' then Date end) as Pickticket_Cancelled
FROM(
SELECT DISTINCT
Company_Code
,Division_Code
,PickTicket_Number
,Date
,Action 
from [JMNYC-AMTDB].[AMTPLUS].[dbo].PickTickets_Audit_Trail WITH (NOLOCK)
)PTS
group by Company_Code, Division_Code, PickTicket_Number
)PTSS 
on O.Company_Code = PTSS.Company_Code 
and O.Division_Code = PTSS.Division_Code 
and PTSS.PickTicket_Number = COALESCE( sh.packslip, P.PickTicket_Number) 
LEFT JOIN (
SELECT
Packslip
,max( case when Action like 'DNLOAD' then Date_Time end) as     Accellos_Download
,max( case when Action like 'ALLOC' then Date_Time end) as      Accellos_Allocated
,max( case when Action like 'WAVEORDR' then Date_Time end) as   Accellos_Waved
,max( case when Action like 'NEWLABEL' then Date_Time end) as   Accellos_Label
,max( case when Action like 'EOL_LSTP' then Date_Time end) as   Accellos_Last_Pick
,max( case when Action like 'RATED' then Date_Time end) as      Accellos_Rating
,max( case when Action like 'SHIPPED' then Date_Time end) as    Accellos_Shipped
,max( case when Action like 'UPLOAD' then Date_Time end) as     Accellos_Upload
FROM(
SELECT DISTINCT
Packslip
,Date_Time
,Action 
from [JMDNJ-ACCELSQL].[A1Warehouse].[dbo].[RF_LOG2] RL WITH (NOLOCK)
)RLTS
group by Packslip
)RLTSS 
on  RLTSS.PACKSLIP = Coalesce(sh.PACKSLIP, TRY_CAST(P.pickticket_number as nvarchar))
WHERE                                                        
--(O.Company_Code = '09' OR '09' IS NULL) AND 
--(O.Division_Code = '001' OR '001' IS NULL) AND
O.Customer_Number = 'ecom2x' AND 
ISNUMERIC(O.Customer_Purchase_Order_Number) <> 0
AND O.Order_Type_Code <> 'B'
AND O.Date_Entered BETWEEN '8/1/2019' AND DATEADD(dayofyear, 1, GETDATE())
ORDER BY O.Date_Entered DESC

如果我像这样运行此查询,过去一周的订单需要四个多小时,谁知道自时间开始以来所有订单需要多少小时。(尽管似乎添加行不会成倍增加时间)。

我认为将此数据插入表中并不太困难。我可以在查询之前添加一个插入语句,让它在一夜之间运行。

困难的部分是保持它的"活力">

我认为最好的方法是做一次大的插入语句。这会将大部分数据放入表中。

然后,我将创建一个运行此查询的作业,但仅限于过去 40 天(因为 40 天后,订单信息几乎不会更改)。

下面是一个示例结果:

Company_Code    Division_Code   Date_Entered    Customer_Number Control_Number  Customer_Purchase_Order_Number  PickTicket_Number   Packslip    Pickticket_Stage_Code   Invoice_Number  Inovice_Stage_Code  Accellos_Status Order_Type_Code Order_Status    Odet_Line_Number    Line_Status Master_Item Item_Number Description Color_Code  Color_Description   Quantity_Ordered    Quantity_Allocated  Quantity_Invoiced   Unit_Retail Unit_MarkDown   Sell_Price  Discount_Value  Sale_Price  Royalty_Cost    Item_Cost   Royalty_Code    Royalty_Percentage  Freight_Charges Master_Freight_Cost Tax_Value   ShortShippedAmount  Return_Number   Quantity_Returned   Email   Customer_Name   Customer_Addr   Customer_City   Customer_State  Customer_Zip    Ship_Via_Code   Ship_Via    SERVICE Ship_Num    Gateway Tags    FinancialState  FulfillmentState    Shopify_Ordered Pro4Soft_Created    Pro4Soft_Downloaded Pro4Soft_Confirmed  Pro4Soft_Fulfilled  Pro4Soft_Updated    Pro4Soft_Closed Pro4Soft_Cancelled  Pro4Soft_Last_Synced    AMT_Created AMT_Invoiced    AMT_Cancelled   Pickticket_Created  Pickticket_Allocated    Accellos_Download   Accellos_Allocated  Accellos_Waved  Accellos_Label  Accellos_Last_Pick  Accellos_Rating Accellos_Shipped    Accellos_Upload Accellos_Timestamp  Tracktrace
1   7   14:00.4 ECOM2X      511322  1721057 NULL    11422781    NULL    10704058    PRINTREADY  SHIPPED & UPLOADED  N   SHIPPED 1   SHIPPED 1950    17DN19501XLEC   PLUS SIZE BERMUDA SHORT 115 RICH BLACK                                          2   0   2   28  0   28  0   28  1.96    6.49    DANSKIN     7   0   7.71    0   0   NULL    NULL    wwright5333@yahoo.com   Wendy WRIGHT    5596 NORTH RD   FRIENDSVILLE    PA  18818   SGRD    USPS    Priority Mail   SHP1580902  shopify_payments    Riskified::approved, Riskified::submitted   paid    fulfilled   2019-08-06 12:12:47.0000000 +00:00  2019-08-06 12:13:30.2019290 +00:00  2019-08-06 12:13:41.9336199 +00:00  2019-08-06 12:14:20.5300000 +00:00  2019-08-06 22:31:37.3958947 +00:00  2019-08-06 23:45:11.0000000 +00:00  2019-08-06 22:31:05.0000000 +00:00  NULL    2019-08-06 23:45:30.1978836 +00:00  14:00.5 30:09.6 NULL    14:01.0 NULL    20190806 08:17:23.67    20190806 08:19:43.82    20190806 08:19:56.43    20190806 08:19:57.18    20190806 11:18:57.31    20190806 12:50:32.00    20190806 15:57:50.87    20190806 15:58:01.76    57:50.8 9.40552E+21

这就是我需要帮助的。

如何创建一个过程/作业来比较表中过去 30-40 天的订单,并查看查询行中的新增内容。 然后,我可以每天运行此作业一次或两次,以更新所有时间戳和运输信息。

也许创建此作业的两个版本会更好?一个每天仅运行一次,用于更新所有时间戳信息,另一个运行一次,用于更新所有返回信息,用于更新所有返回信息。 还是这太复杂了?

作业需要检查每行是否有任何更改,以及是否有更新。如果没有,什么都不做。有没有办法优化它?就像一旦它看到行没有变化,它就会跳过它。

(另外,如果有人对查询有任何一般提示,我将不胜感激。我试图使它尽可能可读。

另外,像这样做所有的数学计算更好吗

,((OD.Price - OD.Discount_Value)* (Royalty_Percentage/100)) Royalty_Cost

在存储过程中,因此结果已经在我的表中,还是对我将针对此新表进行的查询进行计算更好?还是没关系?

我一直在针对此数据编写存储过程,但问题是运行需要很长时间,因为我必须跨多个数据库联接多个表。

"多个数据库"问题不是问题。 SQL Server数据库是管理和安全边界,但跨数据库选择/联接没有任何困难。 所有数据库统计信息都可用于查询优化器。

我想创建一个工作...将所有这些数据添加到一个数据库上的一个表中

小心你的愿望。 正如您已经预见的那样,维护该表将是有效的,并且永远不会是最新的。 即使在更新时,基表中的信息也可能更改。

具体化视图建议很好,但此查询的某些方面确实使查询优化器变得困难,从表数开始。

最简单的改进是使用合并而不是IF X IS NULL构造。 这可能会对某些人有所帮助。

我假设您有适合每个连接的索引,并且您已确认它们正在被使用。 如果索引不是聚集索引,您可能会发现覆盖索引更有帮助。 (覆盖索引具有许多与实例化视图相同的优点。

您使用"时间戳表"的方式尤其成问题。 这个微不足道的错误是 SELECT DISTINCT 在结果聚合的子查询中(使用 GROUP BY)。 区分是不必要的;如果幸运的话,无论如何都会在优化过程中消除它。

更重要的是你为这样的结构付出的代价

,max( case when Action like 'New%' then Date end) as AMT_Created

因为系统必须对每一行进行模式匹配,不止一次。 如果需要将任何内容写入中间表,那就是这些查询。

左联接也会带来成本。 它们真的有必要吗? 也许您可以定期(可能每 15 分钟左右)检查一次跨数据库一致性,并在查询中使用内部联接。

还有两点:

  1. 确保服务器有足够的 RAM。 越多越好,尤其是对于联接。
  2. 努力完全理解查询优化器的报告,并确保每个存储过程都以最佳方式访问表。 一般来说,SQL Server 性能的解决方案是首先索引,其次是临时表,最后是其他所有内容。

每个人迟早都会试图用一张大"扁平"桌子来解决性能问题。 这是很多工作,而且很少有好结果。 最好隔离奇怪的麻烦制造者,并将数据库中的重复数据保持在最低限度。

最新更新