如何为一个自引用SELECT编写SQL,其中一些字段不喜欢相同的字段和一些其他条件



我有以下(简化(表:

CREATE TABLE IF NOT EXISTS `resource` (
`id`         INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
`host`       TEXT NOT NULL,
`inspecting` INTEGER DEFAULT 0,
`visitedAt`  TEXT
);

可以存在多个具有相同host值和/或被认为是host的子域的值的记录。例如:

id |      host       | inspecting |      visitedAt
---------------------------------------------------------
1 |     example.com |          0 |                null
2 |     example.com |          0 | 2020-09-28 00:00:00
3 | sub.example.com |          1 |                null
4 |     example.org |          0 |                null

因此,这些宿主可能已经被访问过,或者目前可以接受检查。我想找到最近未访问过、当前未被检查的最旧主机,并且其主机也不被视为最近访问过或当前检查过的主机的子域。

所以,假设example.com当前正在检查或最近访问过,那么我不想匹配example.comsub.example.com。在上述数据中,样本example.org应该匹配。

我用JOINs和WHERE (NOT) EXISTS尝试过各种查询,但就是无法正常工作。

最接近的查询类似于(但可能不准确(:

SELECT `self`.*
FROM `resource` AS `self`
WHERE 
`self`.`inspecting` != 1 AND 
(`self`.`visitedAt` IS NULL OR datetime( `self`.`visitedAt` ) <= datetime( 'now', '-10 minutes' )) AND
NOT EXISTS (  
SELECT 1
FROM 
`resource` AS `probe`
WHERE
`probe`.`inspecting` = 1 AND 
(`self`.`host` = `probe`.`host` OR `self`.`host` LIKE "%." || `probe`.`host`) AND 
(`probe`.`visitedAt` IS NOT NULL AND datetime( `probe`.`visitedAt` ) > datetime( 'now', '-10 minutes' ))
)
ORDER BY `self`.`visitedAt` ASC
LIMIT 1

是否可以用一个查询过滤掉这样一行?

您可以在select中使用带有子查询的通用表表达式来创建条件列,如:

WITH `t` AS (
SELECT  
-- Create a conditional column `inspectionFlag`
CASE WHEN
(
-- Checks the `resource` table for hosts that has inspecting = 1
SELECT 1 FROM `resource` AS `probe` 
WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
AND `probe`.`inspecting` = 1
) IS NOT NULL THEN 1 ELSE 0 END
AS `inspectionFlag`,
CASE 
WHEN     
-- Checks the `resource` table for hosts that are being visited
(SELECT 1 FROM `resource` AS `probe`
WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
AND `probe`.`visitedAt` IS NOT NULL
AND datetime( `probe`.`visitedAt` ) > datetime( 'now', '-10 minutes' )
) IS NOT NULL THEN 1 ELSE 0 END 
AS `visitedFlag`
, `self`.*
FROM `resource` AS `self`
) 
SELECT `t`.* FROM `t`

现在的结果应该是这样的:

| inspectionFlag | visitedFlag | id  | host            | inspecting | visitedAt           |
| -------------- | ----------- | --- | --------------- | ---------- | ------------------- |
| 1              | 1           | 1   | example.com     | 0          |                     |
| 1              | 1           | 2   | example.com     | 0          | 2020-09-28 08:00:00 |
| 1              | 1           | 3   | sub.example.com | 1          |                     |
| 0              | 0           | 4   | example.org     | 0          |                     |

现在只是使用新的";标志";列以筛选出主机,如下所示:

WITH `t` AS (
SELECT  
CASE WHEN
(
SELECT 1 FROM `resource` AS `probe` 
WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
AND `probe`.`inspecting` = 1
) IS NOT NULL THEN 1 ELSE 0 END
AS `inspectionFlag`,
CASE 
WHEN     
(SELECT 1 FROM `resource` AS `probe`
WHERE (`probe`.`host` LIKE '%.' || `self`.`host` OR `self`.`host` LIKE '%.' || `probe`.`host` OR `self`.`host` = `probe`.`host`)
AND `probe`.`visitedAt` IS NOT NULL
AND datetime( `probe`.`visitedAt` ) > datetime( 'now', '-10 minutes' )
) IS NOT NULL THEN 1 ELSE 0 END 
AS `visitedFlag`
, `self`.*
FROM `resource` AS `self`
) 
SELECT `t`.* FROM `t`
WHERE `t`.`inspectionFlag` = 0 AND `t`.`visitedFlag` = 0

结果:

| inspectionFlag | visitedFlag | id  | host        | inspecting | visitedAt |
| -------------- | ----------- | --- | ----------- | ---------- | --------- |
| 0              | 0           | 4   | example.org | 0          |           |

最新更新