我需要根据OrderDate查找每个客户的第一个和最后一个订单,以及每个订单中业务量最高的商品的名称和SKU。作为参考,Customer表有150k条记录,Orders和OrderDetails(这些是Items)更多。
注意:订单和他们各自的项目应该与客户在同一行
订单OrderID OrderDate CustomerID BusinessVolumeTotal Subtotal
13212 '2021-09-06' 512312 500.00 25.60
OrderDetails
OrderID ItemCode ItemDescription BusinessVolume
13212 'SKW-BS' 'Some item' 450.00
在我的第一个查询中,我试图坚持连接而不是子查询,这导致了这个
select distinct(c.CustomerID), c.FirstName + ' ' + c.LastName as Name,
cs.CustomerStatusDescription as Status,
ct.CustomerTypeDescription as Type, pv.Volume80 as G3,
fo.OrderID,fo.OrderDate,fo.SubTotal,fo.Country, fod.ItemCode, fod.ItemDescription, fopt.PriceTypeID,
lo.OrderID,lo.OrderDate,lo.SubTotal,lo.Country, lod.ItemCode, lod.ItemDescription, lopt.PriceTypeID
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
left join Orders fo on fo.CustomerID = c.CustomerID -- First Order
left join Orders lo on lo.CustomerID = c.CustomerID -- Last Order
left join OrderDetails fod on fod.OrderID = fo.OrderID
left join OrderDetails lod on lod.OrderID = lo.OrderID
left join PriceTypes fopt on fo.PriceTypeID = fopt.PriceTypeID
left join PriceTypes lopt on lo.PriceTypeID = lopt.PriceTypeID
where c.CustomerStatusID in (1,2)
and c.CustomerTypeID in (2,3)
and pv.PeriodTypeID = 2
/* First Order */
and fo.OrderID = (select top 1(OrderID) from Orders where CustomerID = c.CustomerID and OrderStatusID>=7 order by OrderDate )
and fod.ItemID = (select top 1(ItemID) from OrderDetails where OrderID = fo.OrderID order by BusinessVolume)
/* Last Order */
and lo.OrderID = (select top 1(OrderID) from Orders where CustomerID = c.CustomerID and OrderStatusID>=7 order by OrderDate desc)
and lod.ItemID = (select top 1(ItemID) from OrderDetails where OrderID = lo.OrderID order by BusinessVolume desc)
and pv.PeriodID = (select PeriodID from Periods where PeriodTypeID=2 and StartDate <= @now and EndDate >= @now)
但这最终执行了大约6-7分钟。从解释计划来看,看起来大部分都被基于OrderStatusID>= 7的订单键查找所占用。
所以我尝试使用窗口函数来达到同样的效果:
select distinct(c.CustomerID), c.FirstName + ' ' + c.LastName as Name, cs.CustomerStatusDescription as Status,
ct.CustomerTypeDescription as Type, pv.Volume80 as G3,
fal.*
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
left join(
select
CustomerID,
max(case when MinDate = 1 then OrderID end) FirstOrderID,
max(case when MinDate = 1 then OrderDate end) FirstOrderDate,
max(case when MinDate = 1 then BusinessVolumeTotal end) FirstBVTotal,
max(case when MinDate = 1 then PriceTypeDescription end) FirstPriceType,
max(case when MinDate = 1 then ItemCode end) FirstItemCode,
max(case when MinDate = 1 then ItemDescription end) FirstItemDescription,
max(case when MaxDate = 1 then OrderID end) LastOrderID,
max(case when MaxDate = 1 then OrderDate end) LastOrderDate,
max(case when MaxDate = 1 then BusinessVolumeTotal end) LastBVTotal,
max(case when MaxDate = 1 then PriceTypeDescription end) LastPriceType,
max(case when MaxDate = 1 then ItemCode end) LastItemCode,
max(case when MaxDate = 1 then ItemDescription end) LastItemDescription
from
(
select distinct o.CustomerID,
o.OrderID,
o.OrderDate,
o.BusinessVolumeTotal,
PT.PriceTypeDescription,
RANK() over (partition by o.CustomerID order by OrderDate) as MinDate,
RANK() over (partition by o.CustomerID order by OrderDate desc) as MaxDate,
FIRST_VALUE(ItemCode) over (partition by od.OrderID order by BusinessVolume desc) as ItemCode,
FIRST_VALUE(ItemDescription) over (partition by od.OrderID order by BusinessVolume desc) as ItemDescription
from Orders o
left join OrderDetails od on od.OrderID = o.OrderID
left join PriceTypes PT on o.PriceTypeID = PT.PriceTypeID
where o.OrderStatusID >= 7
) fal
group by CustomerID
) fal on c.CustomerId = fal.CustomerID
where c.CustomerStatusID in (1,2)
and c.CustomerTypeID in (2,3)
and pv.PeriodTypeID = 2
/* CurrentG3 */
and pv.PeriodID = (select PeriodID from Periods where PeriodTypeID=2 and StartDate <= @now and EndDate >= @now)
唉,结果执行的时间更长了。如果可能的话,我需要一种方法来优化它。
二次查询我还需要最近3个月、6个月和12个月的每笔订单的数量和总数量。我目前以编程方式在原始返回结果之后作为辅助查询执行此操作,并且转发customerid,如下所示:
select count(OrderID) as Cnt, sum(BusinessVolumeTotal) as Bv, CustomerID
from Orders where OrderStatusID > 6 and OrderTypeID in (1,4,8,11)
and OrderDate >= @timeAgo and CustomerID in @ids group by CustomerID
乘以3,因为3 6和12个月。理想情况下,我也想使这一部分的原始,但我真的没有一个好主意,如何做到这一点,特别是如何复杂的连接是与订单。
理想情况下,结果表应该是这样的
CustomerID Name CustomerStatus CustomerType FirstOrderID FirstOrderDate FirstBVTotal FirstItemCode FirstItemDesc FirstPriceType LastOrderID LastOrderDate LastBVTotal LastItemCode LastItemDesc LastPriceType ThreeMonthCount ThreeMonthTotal SixMonthCount SixMonthTotal TwelveMonthCount TwelveMonthTotal
512312 'Jane Doe' 'Active' 'Retail' 13212 '2020-06-06' 50.00 'Item1' 'Item 1 desc' 'Retail' 14321 '2021-09-01' 200.00 'Item2' 'Item 2 desc' 'Retail' 45 4305.00 76 8545.60 183 21542.95
任何关于如何优化或减少查询的帮助和建议,以及任何你认为我做错了,我将非常感激。
注:我不知道这个标题是否合适,我是否可以稍后更改它,我已经有一段时间没有使用SO来问问题了。
<标题>更新查询1的实际执行计划:
https://www.brentozar.com/pastetheplan/?id=SJd56RSmK
查询2实际执行计划:
https://www.brentozar.com/pastetheplan/?id=BJ7QHk87Y
我认为您需要记住这类查询的两个要点:
- 窗口函数性能良好的关键是不要引入不必要的排序。所以当你可以用
ROW_NUMBER
得到任何一个方向的一阶时,你不应该用另一个相反的ROW_NUMBER
来得到最后一个。而是使用LEAD
来检查下一行是否存在,从而告诉您这是否是最后一行。然后你可以使用条件聚合。 - 通常有两种方法来计算first/last:如上所述的行编号解决方案,或者
APPLY
,它会挑选出您需要的确切的行编号。
我认为对于OrderDetails
我们应该使用apply,因为我们只需要找到每个客户的两个订单。这确实需要良好的索引,所以如果OrderDetails
没有很好地索引,那么您可能也需要切换到行编号解决方案。
select
c.CustomerID,
c.FirstName + ' ' + c.LastName as Name,
cs.CustomerStatusDescription as Status,
ct.CustomerTypeDescription as Type,
pv.Volume80 as G3,
o.FirstOrderID,
o.FirstOrderDate,
o.FirstSubTotal,
o.FirstCountry,
fod.ItemCode as FirstItemCode,
fod.ItemDescription as FirstItemDescription,
fopt.PriceTypeDescription as FirstPriceTypeDescription,
o.LastOrderID,
o.LastOrderDate,
o.LastSubTotal,
o.LastCountry,
lod.ItemCode as LastItemCode,
lod.ItemDescription as LastItemDescription,
lopt.PriceTypeDescription as LastPriceTypeDescription
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
and pv.PeriodTypeID = 2
and pv.PeriodID = (
select top 1 PeriodID
from Periods p
where p.PeriodTypeID = 2
and p.StartDate <= @now
and p.EndDate >= @now
)
left join (
select
o.CustomerID,
min(case when rn = 1 then OrderID end) as FirstOrderId,
min(case when rn = 1 then OrderDate end) as FirstOrderDate,
min(case when rn = 1 then SubTotal end) as FirstSubTotal,
min(case when rn = 1 then Country end) as FirstCountry,
min(case when nx is null then OrderID end) as LastOrderId,
min(case when nx is null then OrderDate end) as LastOrderDate,
min(case when nx is null then SubTotal end) as LastSubTotal,
min(case when nx is null then Country end) as LastCountry,
count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal,
count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal,
count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal
from (
select *,
ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn,
LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx
from Orders o
where o.OrderStatusID >= 7
and o.OrderTypeID in (1,4,8,11)
and o.OrderDate >= @timeAgo
) o
group by o.CustomerID
) o on o.CustomerID = c.CustomerID
outer apply (
select top 1
od.ItemCode,
od.ItemDescription
from OrderDetails od
order by od.BusinessVolume desc
where od.OrderID = o.FirstOrderId
) fod
outer apply (
select top 1
od.ItemCode,
od.ItemDescription
from OrderDetails od
order by od.BusinessVolume desc
where od.OrderID = o.LastOrderId
) lod
left join PriceTypes fopt on fopt.PriceTypeID = o.FirstPriceTypeID
left join PriceTypes lopt on lopt.PriceTypeID = o.LastPriceTypeID
where c.CustomerStatusID in (1,2)
and c.CustomerTypeID in (2,3);
我还将给你一个行编号的版本,从你的执行计划来看,它实际上可能更好。你需要同时尝试
select
c.CustomerID,
c.FirstName + ' ' + c.LastName as Name,
cs.CustomerStatusDescription as Status,
ct.CustomerTypeDescription as Type,
pv.Volume80 as G3,
o.FirstOrderID,
o.FirstOrderDate,
o.FirstSubTotal,
o.FirstCountry,
o.FirstItemCode,
o.FirstItemDescription,
o.FirstPriceTypeDescription,
o.LastOrderID,
o.LastOrderDate,
o.LastSubTotal,
o.LastCountry,
o.LastItemCode,
o.LastItemDescription,
o.LastPriceTypeDescription
from Customers c
left join CustomerTypes ct on ct.CustomerTypeID = c.CustomerTypeID
left join CustomerStatuses cs on cs.CustomerStatusID = c.CustomerStatusID
left join PeriodVolumes pv on pv.CustomerID = c.CustomerID
and pv.PeriodTypeID = 2
and pv.PeriodID = (
select top 1 PeriodID
from Periods p
where p.PeriodTypeID = 2
and p.StartDate <= @now
and p.EndDate >= @now
)
left join (
select
o.CustomerID,
min(case when rn = 1 then o.OrderID end) as FirstOrderId,
min(case when rn = 1 then o.OrderDate end) as FirstOrderDate,
min(case when rn = 1 then o.SubTotal end) as FirstSubTotal,
min(case when rn = 1 then o.Country end) as FirstCountry,
min(case when rn = 1 then od.ItemCode end) as FirstItemCode,
min(case when rn = 1 then od.ItemDescription end) as FirstItemDescription,
min(case when rn = 1 then opt.PriceTypeDescription end) as FirstPriceTypeDescription,
min(case when nx is null then o.OrderID end) as LastOrderId,
min(case when nx is null then o.OrderDate end) as LastOrderDate,
min(case when nx is null then o.SubTotal end) as LastSubTotal,
min(case when nx is null then o.Country end) as LastCountry,
min(case when nx is null then od.ItemCode end) as LastItemCode,
min(case when nx is null then od.ItemDescription end) as LastItemDescription,
min(case when nx is null then opt.PriceTypeDescription end) as LastPriceTypeDescription,
count(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then 1 end) as ThreeMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -3, GETDATE()) then BusinessVolumeTotal end) as ThreeMonthTotal,
count(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then 1 end) as SixMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -6, GETDATE()) then BusinessVolumeTotal end) as SixMonthTotal,
count(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then 1 end) as TwelveMonthCount,
sum(case when o.OrderDate >= DATEADD(month, -12, GETDATE()) then BusinessVolumeTotal end) as TwelveMonthTotal
from (
select *,
ROW_NUMBER() over (partition by o.CustomerID order by OrderDate) as rn,
LEAD(OrderID) over (partition by o.CustomerID order by OrderDate) as nx
from Orders o
where o.OrderStatusID >= 7
and o.OrderTypeID in (1,4,8,11)
and o.OrderDate >= @timeAgo
) o
left join PriceTypes opt on opt.PriceTypeID = o.PriceTypeID
join (
select *,
ROW_NUMBER() over (partition by od.OrderID order by od.BusinessVolume desc) as rn
from OrderDetails od
) od on od.OrderID = o.OrderId
where rn = 1 or nx is null
) o on o.CustomerID = c.CustomerID
where c.CustomerStatusID in (1,2)
and c.CustomerTypeID in (2,3);
良好的索引对于良好的性能至关重要。我希望在您的表上大致有以下索引,无论是集群索引还是非集群索引(集群索引INCLUDE
自动每隔一列),如果需要,显然可以添加其他INCLUDE
列:
Customers (CustomerID) INCLUDE (FirstName, LastName)
CustomerTypes (CustomerTypeID) INCLUDE (CustomerTypeDescription)
CustomerStatuses (CustomerStatusID) INCLUDE (CustomerTypeDescription)
PeriodVolumes (CustomerID) INCLUDE (Volume80)
Periods (PeriodTypeID, StartDate, PeriodID) INCLUDE (EndDate) -- can swap Start and End
Orders (CustomerID, OrderDate) INCLUDE (OrderStatusID, SubTotal, Country, BusinessVolumeTotal)
OrderDetails (OrderID, BusinessVolume) INCLUDE (ItemCode ItemDescription)
PriceTypes (PriceTypeID) INCLUDE (PriceTypeDescription)
您应该仔细考虑INNER
和LEFT
连接,因为优化器可以更容易地移动INNER
连接。
还要注意,DISTINCT
不是一个函数,它是在整个列集上计算的。一般来说,如果查询中有一个DISTINCT
,那么可以假设连接没有被正确地考虑过。