如果我有一个像下面这样的表:
<表类>
id_
last_updated_by
tbody><<tr>1 机器人 1人类 1机器人 2机器人 3 机器人 3 人类 表类>
如果我遵循你的问题,那么像这样的东西应该工作。
SELECT
t.*
,CASE WHEN a.id IS NOT NULL THEN 1 ELSE 0 END AS updated_by_human
FROM table t
LEFT JOIN (SELECT DISTINCT id FROM table WHERE last_updated_by = 'human') a ON t.id = a.id
处理updated_by_human字段,但是如果您还需要减少表中的记录(只保留一个子集),那么您需要更多的信息来做到这一点。
Exists
子句通常性能不高,但如果你的数据不是很大,这应该可以工作。
select id_,
IF (EXISTS (SELECT 1 FROM table_name t2 WHERE t2.last_updated_by = 'human' and t2.id_ = t1.id_), 1, 0) AS updated_by_human
from table_name t1;
这里有另一种方法
SELECT *
FROM table_name t1
GROUP BY ti.id_
HAVING COUNT(*) > 0
AND MAX(CASE t1.last_updated_by WHEN 'human' THEN 1 ELSE 0 END) = 1;
由于您没有指定用于确定该记录是给定id
添加的最新记录的列,因此我假设将有一列用于跟踪插入/修改时间戳(这是相当标准的表设计),让我们将其放置为last_updated_timestamp
(如果您没有任何,那么我仍然坚持您有一个,因为没有时间戳的审计跟踪是没有意义的)
给定表名是updating_trail
SELECT updating_trail.*, last_update_trail.modified_by_human
FROM updating_trail
INNER JOIN (
-- determine the id_, the lastest modified_timestamp, and a flag check to determine if there is any record with last_update_by is 'human' -> if yes then give 1
SELECT updating_trail.id_, MAX(last_update_timestamp) AS most_recent_update_ts, MAX(CASE WHEN updating_trail.last_updated_by = 'human' THEN 1 ELSE 0 END) AS modified_by_human
FROM updating_trail
GROUP BY updating_trail.id_
) last_update_trail
ON updating_trail.id_ = last_update_trail.id_ AND updating_trail.last_update_timestamp = last_update_trail.most_recent_update_ts;
为
id_ | last_updated_by | last_update_timestamp | modified_by_human | 1 | 机器人 | 2021 - 10 - 19 t20:00:00.000z | 1 | 2
---|---|---|---|
机器人 | 2021 - 10 - 19 t17:00:00.000z | 0 | |
3 | 机器人 | 2021 - 10 - 19 t16:00:00.000z | 1 |
这是您的查询到条件聚合的1:1翻译:
SELECT TRANSACTION_ID,
CREATED_DATE,
CREATED_BY_USER_ID,
OWNER_USER_ID,
Max(CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1
END) Over (PARTITION BY TRANSACTION_ID) AS CREATED_BY_HUMAN
FROM Table_Name
WHERE CREATED_DATE >= Cast('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= Cast('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY Row_Number() Over (PARTITION BY TRANSACTION_ID ORDER BY End_Dt DESC) = 1