是否有更好的方法来识别序列模式的时间间隔边界?



我有一个付款表,其中有正值和负值(即捕获和贷记)。我需要找出从上次净额为正以来,我们收到净额为正的点。例如,如果客户支付了这些款项并收到了这些信用:

01/01  $100 <-
02/01 -$100
03/01 -$100
04/01  $100
05/01  $100
06/01  $100 <-

…然后点将是01/01和06/01:从02/01到04/01,他们有一个负的余额,到05/01,他们有一个零余额。

我目前的方法首先从所有捕获的日期中构建结束日期列表,然后计算每个日期的开始日期,最后计算这些时期的净捕获:

Start      End        NetCaptures
1900/01/01 2011/01/01  $100
2011/01/02 2011/04/01 -$100
2011/04/02 2011/05/01  $100
2011/05/02 2011/06/01  $100

然后丢弃netcapture值为0或更少的记录,重新计算开始日期,重新计算net capture,并重复直到没有记录要删除,留下这个。

Start      End        NetCaptures
1900/01/01 2011/01/01  $100
2011/01/02 2011/06/01  $100

有更好的方法吗?一些分析表达式的巧妙使用?这就接近RBAR了。实际上,它的运行速度是可以接受的(500K条记录10分钟,而在我开始以这种方式计算信用之前是1.5分钟)。

* RESULT *

虽然Microsoft确实支持优雅的滚动总数函数,但使用该思想,我最终编写了这样的代码:计算所有捕获,计算每次捕获之前的运行总数,并丢弃那些具有相等或更大运行总数的较早记录。

CREATE TABLE #Sequences
    (
    OrderID INT NOT NULL,
    Sequence    INT NOT NULL,
    PRIMARY KEY (OrderID, Sequence),
    StartDate   DATE NOT NULL DEFAULT '1900-01-01',
    EndDate DATE NOT NULL,
    CapturesThisPeriod  DECIMAL(18, 2) NOT NULL DEFAULT 0.00,
    )
INSERT INTO #Sequences (OrderID, Sequence, EndDate)
    SELECT OrderID, ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY DateReceived), DateReceived
    FROM Receipts
    WHERE Amount > 0.00
/* Calculate the start date for each period */
UPDATE S
SET StartDate = DATEADD(D, 1, Prev.EndDate)
FROM
    #Sequences AS S
    INNER JOIN #Sequences AS Prev ON S.OrderID = Prev.OrderID AND Prev.Sequence = S.Sequence - 1
/* Calculate the cumulative total for each period */
UPDATE M
SET CumulativeReceipts = R.Receipts
FROM
    #Sequences AS M
    INNER JOIN      
        (
        SELECT
            M.OrderID, M.Sequence, SUM(R.Amount) AS Receipts
        FROM
            #Sequences AS M
            INNER JOIN Receipts AS R ON M.OrderID = R.OrderID AND R.DateReceived <= M.EndDate
        GROUP BY
            M.OrderID, M.Sequence
        ) AS R ON M.OrderID = R.OrderID AND M.Sequence = R.Sequence
/* Delete sequences with do not represent net positive receipts */
DELETE FROM M
FROM #Sequences AS M
WHERE EXISTS (SELECT * FROM #Sequences AS Prev WHERE M.OrderID = Prev.OrderID AND Prev.Sequence < M.Sequence AND Prev.CumulativeReceipts >= M.CumulativeReceipts)
/* Recalculate sequence numbers and dates */
UPDATE S SET Sequence = NewSequence FROM (SELECT Sequence, ROW_NUMBER() OVER (PARTITION BY OrderID ORDER BY Sequence) AS NewSequence FROM #Sequences) AS S
UPDATE S
SET StartDate = DATEADD(D, 1, Prev.EndDate)
FROM
    #Sequences AS S
    INNER JOIN #Sequences AS Prev ON S.OrderID = Prev.OrderID AND Prev.Sequence = S.Sequence - 1
    END
/* Calculate net captures per period, and continue with analysis */

查找"running sum";例如:http://explainextended.com/2010/01/22/sql-server-running-totals

相关内容

  • 没有找到相关文章

最新更新