获取每个Customer的第一个和最后一个Order以及每个订单中的最高值Item,所有这些都是单独的表



我需要根据OrderDate查找每个客户的第一个和最后一个订单,以及每个订单中业务量最高的商品的名称和SKU。作为参考,Customer表有150k条记录,Orders和OrderDetails(这些是Items)更多。

注意:订单和他们各自的项目应该与客户在同一行

订单
OrderID   OrderDate     CustomerID   BusinessVolumeTotal   Subtotal
13212     '2021-09-06'  512312       500.00                25.60

OrderDetails

OrderID   ItemCode   ItemDescription   BusinessVolume
13212     'SKW-BS'   'Some item'       450.00

在我的第一个查询中,我试图坚持连接而不是子查询,这导致了这个

select distinct(c.CustomerID), c.FirstName + ' ' + c.LastName as Name,
cs.CustomerStatusDescription as Status,
ct.CustomerTypeDescription as Type, pv.Volume80 as G3,
fo.OrderID,fo.OrderDate,fo.SubTotal,fo.Country, fod.ItemCode, fod.ItemDescription, fopt.PriceTypeID,
lo.OrderID,lo.OrderDate,lo.SubTotal,lo.Country, lod.ItemCode, lod.ItemDescription, lopt.PriceTypeID
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
left join Orders fo on fo.CustomerID = c.CustomerID -- First Order
left join Orders lo on lo.CustomerID = c.CustomerID -- Last Order
left join OrderDetails fod on fod.OrderID = fo.OrderID
left join OrderDetails lod on lod.OrderID = lo.OrderID
left join PriceTypes fopt on fo.PriceTypeID = fopt.PriceTypeID
left join PriceTypes lopt on lo.PriceTypeID = lopt.PriceTypeID
where c.CustomerStatusID in (1,2)
and c.CustomerTypeID in (2,3)
and pv.PeriodTypeID = 2
/* First Order */
and fo.OrderID = (select top 1(OrderID) from Orders where CustomerID = c.CustomerID and OrderStatusID>=7 order by OrderDate )
and fod.ItemID = (select top 1(ItemID) from OrderDetails where OrderID = fo.OrderID order by BusinessVolume)
/* Last Order */
and lo.OrderID = (select top 1(OrderID) from Orders where CustomerID = c.CustomerID and OrderStatusID>=7 order by OrderDate desc)
and lod.ItemID = (select top 1(ItemID) from OrderDetails where OrderID = lo.OrderID order by BusinessVolume desc)
and pv.PeriodID = (select PeriodID from Periods where PeriodTypeID=2 and StartDate <= @now and EndDate >= @now)

但这最终执行了大约6-7分钟。从解释计划来看,看起来大部分都被基于OrderStatusID>= 7的订单键查找所占用。

所以我尝试使用窗口函数来达到同样的效果:

select distinct(c.CustomerID), c.FirstName + ' ' + c.LastName as Name, cs.CustomerStatusDescription as Status,
ct.CustomerTypeDescription as Type,  pv.Volume80 as G3, 
fal.*
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
left join(
select
CustomerID,
max(case when MinDate = 1 then OrderID end)                 FirstOrderID,
max(case when MinDate = 1 then OrderDate end)               FirstOrderDate,
max(case when MinDate = 1 then BusinessVolumeTotal end)     FirstBVTotal,
max(case when MinDate = 1 then PriceTypeDescription end)    FirstPriceType,
max(case when MinDate = 1 then ItemCode end)                FirstItemCode,
max(case when MinDate = 1 then ItemDescription end)         FirstItemDescription,
max(case when MaxDate = 1 then OrderID end)                 LastOrderID,
max(case when MaxDate = 1 then OrderDate end)               LastOrderDate,
max(case when MaxDate = 1 then BusinessVolumeTotal end)     LastBVTotal,
max(case when MaxDate = 1 then PriceTypeDescription end)    LastPriceType,
max(case when MaxDate = 1 then ItemCode end)                LastItemCode,
max(case when MaxDate = 1 then ItemDescription end)         LastItemDescription
from
(
select distinct o.CustomerID,
o.OrderID,
o.OrderDate,
o.BusinessVolumeTotal,
PT.PriceTypeDescription,
RANK() over (partition by o.CustomerID order by OrderDate) as MinDate,
RANK() over (partition by o.CustomerID order by OrderDate desc) as MaxDate,
FIRST_VALUE(ItemCode) over (partition by od.OrderID order by BusinessVolume desc) as ItemCode,
FIRST_VALUE(ItemDescription) over (partition by od.OrderID order by BusinessVolume desc) as ItemDescription
from Orders o
left join OrderDetails od on od.OrderID = o.OrderID
left join PriceTypes PT on o.PriceTypeID = PT.PriceTypeID
where o.OrderStatusID >= 7
) fal
group by CustomerID
) fal on c.CustomerId = fal.CustomerID
where c.CustomerStatusID in (1,2)
and c.CustomerTypeID in (2,3)
and pv.PeriodTypeID = 2
/* CurrentG3 */
and pv.PeriodID = (select PeriodID from Periods where PeriodTypeID=2 and StartDate <= @now and EndDate >= @now)

唉,结果执行的时间更长了。如果可能的话,我需要一种方法来优化它。

二次查询我还需要最近3个月、6个月和12个月的每笔订单的数量和总数量。我目前以编程方式在原始返回结果之后作为辅助查询执行此操作,并且转发customerid,如下所示:

select count(OrderID) as Cnt, sum(BusinessVolumeTotal) as Bv, CustomerID
from Orders where OrderStatusID > 6 and OrderTypeID in (1,4,8,11)
and OrderDate >= @timeAgo and CustomerID in @ids group by CustomerID

乘以3,因为3 6和12个月。理想情况下,我也想使这一部分的原始,但我真的没有一个好主意,如何做到这一点,特别是如何复杂的连接是与订单。

理想情况下,结果表应该是这样的

CustomerID  Name       CustomerStatus  CustomerType  FirstOrderID  FirstOrderDate  FirstBVTotal FirstItemCode  FirstItemDesc  FirstPriceType  LastOrderID  LastOrderDate  LastBVTotal  LastItemCode  LastItemDesc  LastPriceType  ThreeMonthCount  ThreeMonthTotal  SixMonthCount SixMonthTotal  TwelveMonthCount  TwelveMonthTotal
512312     'Jane Doe'  'Active'        'Retail'      13212         '2020-06-06'    50.00        'Item1'        'Item 1 desc'  'Retail'        14321        '2021-09-01'   200.00       'Item2'       'Item 2 desc' 'Retail'       45               4305.00          76            8545.60        183               21542.95

任何关于如何优化或减少查询的帮助和建议,以及任何你认为我做错了,我将非常感激。

注:我不知道这个标题是否合适,我是否可以稍后更改它,我已经有一段时间没有使用SO来问题了。

<标题>

更新查询1的实际执行计划:
https://www.brentozar.com/pastetheplan/?id=SJd56RSmK

查询2实际执行计划:
https://www.brentozar.com/pastetheplan/?id=BJ7QHk87Y

我认为您需要记住这类查询的两个要点:

  • 窗口函数性能良好的关键是不要引入不必要的排序。所以当你可以用ROW_NUMBER得到任何一个方向的一阶时,你不应该用另一个相反的ROW_NUMBER来得到最后一个。而是使用LEAD来检查下一行是否存在,从而告诉您这是否是最后一行。然后你可以使用条件聚合。
  • 通常有两种方法来计算first/last:如上所述的行编号解决方案,或者APPLY,它会挑选出您需要的确切的行编号。
    我认为对于OrderDetails我们应该使用apply,因为我们只需要找到每个客户的两个订单。这确实需要良好的索引,所以如果OrderDetails没有很好地索引,那么您可能也需要切换到行编号解决方案。
select
c.CustomerID,
c.FirstName + ' ' + c.LastName as Name,
cs.CustomerStatusDescription as Status,
ct.CustomerTypeDescription as Type,
pv.Volume80 as G3,
o.FirstOrderID,
o.FirstOrderDate,
o.FirstSubTotal,
o.FirstCountry,
fod.ItemCode as FirstItemCode,
fod.ItemDescription as FirstItemDescription,
fopt.PriceTypeDescription as FirstPriceTypeDescription,
o.LastOrderID,
o.LastOrderDate,
o.LastSubTotal,
o.LastCountry,
lod.ItemCode as LastItemCode,
lod.ItemDescription as LastItemDescription,
lopt.PriceTypeDescription as LastPriceTypeDescription 
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
and pv.PeriodTypeID = 2
and pv.PeriodID = (
select top 1 PeriodID
from Periods p
where p.PeriodTypeID = 2
and p.StartDate <= @now
and p.EndDate >= @now
)
left join (
select
o.CustomerID,
min(case when rn = 1 then OrderID end) as FirstOrderId,
min(case when rn = 1 then OrderDate end) as FirstOrderDate,
min(case when rn = 1 then SubTotal end) as FirstSubTotal,
min(case when rn = 1 then Country end) as FirstCountry,
min(case when nx is null then OrderID end) as LastOrderId,
min(case when nx is null then OrderDate end) as LastOrderDate,
min(case when nx is null then SubTotal end) as LastSubTotal,
min(case when nx is null then Country end) as LastCountry,
count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal,
count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal,
count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal
from (
select *,
ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn,
LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx
from Orders o
where o.OrderStatusID >= 7
and o.OrderTypeID in (1,4,8,11)
and o.OrderDate >= @timeAgo
) o
group by o.CustomerID
) o on o.CustomerID = c.CustomerID
outer apply (
select top 1
od.ItemCode,
od.ItemDescription
from OrderDetails od
order by od.BusinessVolume desc
where od.OrderID = o.FirstOrderId
) fod
outer apply (
select top 1
od.ItemCode,
od.ItemDescription
from OrderDetails od
order by od.BusinessVolume desc
where od.OrderID = o.LastOrderId
) lod
left join PriceTypes fopt on fopt.PriceTypeID = o.FirstPriceTypeID 
left join PriceTypes lopt on lopt.PriceTypeID = o.LastPriceTypeID 
where c.CustomerStatusID in (1,2)
and c.CustomerTypeID in (2,3);

我还将给你一个行编号的版本,从你的执行计划来看,它实际上可能更好。你需要同时尝试

select
c.CustomerID,
c.FirstName + ' ' + c.LastName as Name,
cs.CustomerStatusDescription as Status,
ct.CustomerTypeDescription as Type,
pv.Volume80 as G3,
o.FirstOrderID,
o.FirstOrderDate,
o.FirstSubTotal,
o.FirstCountry,
o.FirstItemCode,
o.FirstItemDescription,
o.FirstPriceTypeDescription,
o.LastOrderID,
o.LastOrderDate,
o.LastSubTotal,
o.LastCountry,
o.LastItemCode,
o.LastItemDescription,
o.LastPriceTypeDescription 
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
and pv.PeriodTypeID = 2
and pv.PeriodID = (
select top 1 PeriodID
from Periods p
where p.PeriodTypeID = 2
and p.StartDate <= @now
and p.EndDate >= @now
)
left join (
select
o.CustomerID,
min(case when rn = 1 then o.OrderID end) as FirstOrderId,
min(case when rn = 1 then o.OrderDate end) as FirstOrderDate,
min(case when rn = 1 then o.SubTotal end) as FirstSubTotal,
min(case when rn = 1 then o.Country end) as FirstCountry,
min(case when rn = 1 then od.ItemCode end) as FirstItemCode,
min(case when rn = 1 then od.ItemDescription end) as FirstItemDescription,
min(case when rn = 1 then opt.PriceTypeDescription end) as FirstPriceTypeDescription,
min(case when nx is null then o.OrderID end) as LastOrderId,
min(case when nx is null then o.OrderDate end) as LastOrderDate,
min(case when nx is null then o.SubTotal end) as LastSubTotal,
min(case when nx is null then o.Country end) as LastCountry,
min(case when nx is null then od.ItemCode end) as LastItemCode,
min(case when nx is null then od.ItemDescription end) as LastItemDescription,
min(case when nx is null then opt.PriceTypeDescription end) as LastPriceTypeDescription,
count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal,
count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal,
count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal
from (
select *,
ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn,
LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx
from Orders o
where o.OrderStatusID >= 7
and o.OrderTypeID in (1,4,8,11)
and o.OrderDate >= @timeAgo
) o
left join PriceTypes opt on opt.PriceTypeID = o.PriceTypeID 
join (
select *,
ROW_NUMBER() over (partition by od.OrderID order by od.BusinessVolume desc) as rn
from OrderDetails od
) od on od.OrderID = o.OrderId
where rn = 1 or nx is null
) o on o.CustomerID = c.CustomerID
where c.CustomerStatusID in (1,2)
and c.CustomerTypeID in (2,3);

良好的索引对于良好的性能至关重要。我希望在您的表上大致有以下索引,无论是集群索引还是非集群索引(集群索引INCLUDE自动每隔一列),如果需要,显然可以添加其他INCLUDE列:

Customers (CustomerID) INCLUDE (FirstName, LastName)
CustomerTypes (CustomerTypeID) INCLUDE (CustomerTypeDescription)
CustomerStatuses (CustomerStatusID) INCLUDE (CustomerTypeDescription)
PeriodVolumes (CustomerID) INCLUDE (Volume80)
Periods (PeriodTypeID, StartDate, PeriodID) INCLUDE (EndDate)  -- can swap Start and End
Orders (CustomerID, OrderDate) INCLUDE (OrderStatusID, SubTotal, Country, BusinessVolumeTotal)
OrderDetails (OrderID, BusinessVolume) INCLUDE (ItemCode ItemDescription)
PriceTypes (PriceTypeID) INCLUDE (PriceTypeDescription)

您应该仔细考虑INNERLEFT连接,因为优化器可以更容易地移动INNER连接。

还要注意,DISTINCT不是一个函数,它是在整个列集上计算的。一般来说,如果查询中有一个DISTINCT,那么可以假设连接没有被正确地考虑过。

最新更新