我有一个SQLite
证券价格表,其中包含每个价格记录时的时间戳。我正在尝试编写一个查询,如果我可以传递任意price_datetime
输入,比如2022-08-10 19:000:00.000000
,并为输入price_datetime
<=
的每个security_id
拉取最新的price
。
下面是一个示例表:
┌─────────────┬───────┬────────────────────────────┐
│ security_id │ price │ price_datetime │
├─────────────┼───────┼────────────────────────────┤
│ 4 │ 90.0 │ 2022-08-11 20:00:00.000000 │
│ 1 │ 100.0 │ 2022-08-10 20:00:00.000000 │
│ 1 │ 90.0 │ 2022-08-10 19:00:00.000000 │
│ 2 │ 95.0 │ 2022-08-10 18:00:00.000000 │
│ 2 │ 90.0 │ 2022-08-09 20:00:00.000000 │
│ 1 │ 90.0 │ 2022-08-09 20:00:00.000000 │
│ 3 │ 90.0 │ 2022-08-08 20:00:00.000000 │
└─────────────┴───────┴────────────────────────────┘
输入2022-08-10 19:000:00.000000
将返回如下:
┌─────────────┬───────┬────────────────────────────┐
│ security_id │ price │ price_datetime │
├─────────────┼───────┼────────────────────────────┤
│ 4 │ null │ null │
│ 1 │ 90.0 │ 2022-08-10 19:00:00.000000 │
│ 2 │ 95.0 │ 2022-08-10 18:00:00.000000 │
│ 3 │ 90.0 │ 2022-08-08 20:00:00.000000 │
└─────────────┴───────┴────────────────────────────┘
security_id
4 返回null
,因为没有价格记录price_datetime
<=2022-08-10 19:000:00.000000
。查询还应该能够灵活地返回最后 1、2 等记录<=
输入的price_datetime
,以及最后一个记录。
生成示例表的代码:
CREATE TABLE prices (
security_id INTEGER,
price FLOAT NOT NULL,
price_datetime DATETIME NOT NULL
);
CREATE INDEX ix_prices_price_datetime on "prices" (price_datetime);
INSERT INTO prices VALUES(1,100,'2022-08-10 20:00:00.000000'),(1,90,'2022-08-10 19:00:00.000000'),(1,90,'2022-08-09 20:00:00.000000'),(2,90,'2022-08-09 20:00:00.000000'),(2,95,'2022-08-10 18:00:00.000000'),(3,90,'2022-08-08 20:00:00.000000'),(4,90,'2022-08-11 20:00:00.000000');
SELECT * FROM prices ORDER BY price_datetime DESC;
您需要将不同的security_id
LEFT
连接到表中。
对于每个security_id
的最后价格,最简单的方法是使用聚合,使用 SQLite 的裸列功能:
SELECT s.security_id, p.price, MAX(p.price_datetime) price_datetime
FROM (SELECT DISTINCT security_id FROM prices) s
LEFT JOIN prices p ON p.security_id = s.security_id AND p.price_datetime <= '2022-08-10 19:000:00.000000'
GROUP BY s.security_id;
对于最后n行ROW_NUMBER()
请使用窗口函数:
WITH cte AS (
SELECT s.security_id, p.price, p.price_datetime,
ROW_NUMBER() OVER (PARTITION BY s.security_id ORDER BY p.price_datetime DESC) rn
FROM (SELECT DISTINCT security_id FROM prices) s
LEFT JOIN prices p ON p.security_id = s.security_id AND p.price_datetime <= '2022-08-10 19:000:00.000000'
)
SELECT security_id, price, price_datetime
FROM cte
WHERE rn <= n; -- replace n with the number that you want
请参阅演示。
WITH
query_input(price_datetime, row_count) AS (
VALUES (
'2022-08-10 19:000:00.000000', -- @price_datetime,
2 -- @row_count
)
),
-- Remove this CTE when running against your db containing the actual "prices" table.
prices(security_id, price, price_datetime) AS (
VALUES
(1, 99, '2022-08-10 20:00:00.000000'),
(1, 90, '2022-08-10 19:00:00.000000'),
(1, 90, '2022-08-09 20:00:00.000000'),
(2, 90, '2022-08-09 20:00:00.000000'),
(2, 95, '2022-08-10 18:00:00.000000'),
(3, 90, '2022-08-08 20:00:00.000000'),
(4, 90, '2022-08-11 20:00:00.000000')
),
-- As you requested to show every security, you need to collect all id's.
-- I assume you define table "securities" containing column "id", so that
-- prices.security_id => securities.id. Remove the following query when
-- running against your db and adjust the "ids" CTE below if necessary.
securities(id) AS (VALUES (1), (2), (3), (4)),
ids(id) AS (
-- Adjust "id" and "securities" to match your schema
SELECT id FROM securities
),
filtered_prices AS (
SELECT prices.*
FROM prices, query_input
WHERE prices.price_datetime <= query_input.price_datetime
),
label_prices AS (
SELECT
fp.*,
row_number() OVER (PARTITION BY fp.security_id ORDER BY fp.price_datetime DESC) AS group_id
FROM filtered_prices AS fp
),
last_n_existing_prices AS (
SELECT lp.*
FROM label_prices AS lp, query_input AS qi
WHERE lp.group_id <= qi.row_count
),
last_n_prices AS (
SELECT ids.id AS security_id, lp.price, lp.price_datetime
FROM ids
LEFT JOIN last_n_existing_prices AS lp
ON ids.id = lp.security_id
ORDER BY ids.id, lp.price_datetime DESC
)
SELECT * FROM last_n_prices;