我有以下(简化(表:
CREATE TABLE IF NOT EXISTS `resource` (
`id` INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
`host` TEXT NOT NULL,
`inspecting` INTEGER DEFAULT 0,
`visitedAt` TEXT
);
可以存在多个具有相同host
值和/或被认为是host
的子域的值的记录。例如:
id | host | inspecting | visitedAt
---------------------------------------------------------
1 | example.com | 0 | null
2 | example.com | 0 | 2020-09-28 00:00:00
3 | sub.example.com | 1 | null
4 | example.org | 0 | null
因此,这些宿主可能已经被访问过,或者目前可以接受检查。我想找到最近未访问过、当前未被检查的最旧主机,并且其主机也不被视为最近访问过或当前检查过的主机的子域。
所以,假设example.com
当前正在检查或最近访问过,那么我不想匹配example.com
或sub.example.com
。在上述数据中,样本example.org
应该匹配。
我用JOIN
s和WHERE (NOT) EXISTS
尝试过各种查询,但就是无法正常工作。
最接近的查询类似于(但可能不准确(:
SELECT `self`.*
FROM `resource` AS `self`
WHERE
`self`.`inspecting` != 1 AND
(`self`.`visitedAt` IS NULL OR datetime( `self`.`visitedAt` ) <= datetime( 'now', '-10 minutes' )) AND
NOT EXISTS (
SELECT 1
FROM
`resource` AS `probe`
WHERE
`probe`.`inspecting` = 1 AND
(`self`.`host` = `probe`.`host` OR `self`.`host` LIKE "%." || `probe`.`host`) AND
(`probe`.`visitedAt` IS NOT NULL AND datetime( `probe`.`visitedAt` ) > datetime( 'now', '-10 minutes' ))
)
ORDER BY `self`.`visitedAt` ASC
LIMIT 1
是否可以用一个查询过滤掉这样一行?
您可以在select中使用带有子查询的通用表表达式来创建条件列,如:
WITH `t` AS (
SELECT
-- Create a conditional column `inspectionFlag`
CASE WHEN
(
-- Checks the `resource` table for hosts that has inspecting = 1
SELECT 1 FROM `resource` AS `probe`
WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
AND `probe`.`inspecting` = 1
) IS NOT NULL THEN 1 ELSE 0 END
AS `inspectionFlag`,
CASE
WHEN
-- Checks the `resource` table for hosts that are being visited
(SELECT 1 FROM `resource` AS `probe`
WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
AND `probe`.`visitedAt` IS NOT NULL
AND datetime( `probe`.`visitedAt` ) > datetime( 'now', '-10 minutes' )
) IS NOT NULL THEN 1 ELSE 0 END
AS `visitedFlag`
, `self`.*
FROM `resource` AS `self`
)
SELECT `t`.* FROM `t`
现在的结果应该是这样的:
| inspectionFlag | visitedFlag | id | host | inspecting | visitedAt |
| -------------- | ----------- | --- | --------------- | ---------- | ------------------- |
| 1 | 1 | 1 | example.com | 0 | |
| 1 | 1 | 2 | example.com | 0 | 2020-09-28 08:00:00 |
| 1 | 1 | 3 | sub.example.com | 1 | |
| 0 | 0 | 4 | example.org | 0 | |
现在只是使用新的";标志";列以筛选出主机,如下所示:
WITH `t` AS (
SELECT
CASE WHEN
(
SELECT 1 FROM `resource` AS `probe`
WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
AND `probe`.`inspecting` = 1
) IS NOT NULL THEN 1 ELSE 0 END
AS `inspectionFlag`,
CASE
WHEN
(SELECT 1 FROM `resource` AS `probe`
WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
AND `probe`.`visitedAt` IS NOT NULL
AND datetime( `probe`.`visitedAt` ) > datetime( 'now', '-10 minutes' )
) IS NOT NULL THEN 1 ELSE 0 END
AS `visitedFlag`
, `self`.*
FROM `resource` AS `self`
)
SELECT `t`.* FROM `t`
WHERE `t`.`inspectionFlag` = 0 AND `t`.`visitedFlag` = 0
结果:
| inspectionFlag | visitedFlag | id | host | inspecting | visitedAt |
| -------------- | ----------- | --- | ----------- | ---------- | --------- |
| 0 | 0 | 4 | example.org | 0 | |