大型数据集上的SQL语句超时



试图完成一个web进程,我得到了错误canceling statement due to statement timeout。调试代码库时发现,由于数据集太大,下面的查询超时了。对于如何提高以下查询性能的任何建议,我将不胜感激。

select userid, max(recent_activity_date) recent_activity_date 
from (
SELECT id AS userid,
recent_logged_in AS recent_activity_date
FROM user
WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
UNION
SELECT userid AS userid, max(recentaccessed) AS recent_activity_date
FROM tokencreds
WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
GROUP BY userid
) recent_activity 
WHERE EXISTS(select 1 from user where id = userid and not deleted) 
group by userid 
order by userid;  

每个表的索引:

  • 用户user_recent_logged_in on user (recent_logged_in)
  • 标记:tokencreds_userid_token on tokencreds (userid, token)tokencreds_userid_token是唯一的

很大程度上取决于数据的"布局"。有很多唱片"热门"吗?用户中是否有大量记录?在象征意义上?等等。

就我个人而言,我会选择这个:

SELECT userid, max(recent_activity_date) recent_activity_date _user
FROM (
SELECT id AS userid, MAX(recent_logged_in) AS recent_activity_date
FROM user
WHERE recent_logged_in > now() - cast('10 days' AS INTERVAL)
AND NOT deleted
UNION ALL
SELECT userid AS userid, MAX(recentaccessed) AS recent_activity_date
FROM tokencreds
WHERE recentaccessed > now() - cast('10 days' AS INTERVAL)
AND EXISTS(SELECT * FROM user WHERE id = userid AND NOT deleted)
) recent_activity 
GROUP BY userid 
ORDER BY userid; 


-- indexes 'needed' on :
CREATE INDEX idx_userid_not_deleted ON user (userid) WHERE NOT deleted;
CREATE INDEX idx_recent_logged_in_user_id_not_deleted ON user (recent_logged_in, userid) WHERE not deleted;
CREATE INDEX idx_recentaccessed_user_id ON tokencreds (recentaccessed, userid);

而是YMMV为了得到更好的想法,您真的应该提供完整的EXPLAIN ANALYZE结果,否则我们只是盲目猜测。可能是系统会拒绝使用任何建议的索引,在这种情况下,你最好先删除,然后再偏离正轨。

推理:

  • UNION将在子选择上引起一个隐式的不同,而您并不真正需要它,因为稍后的MAX()GROUP BY会做同样的事情,为什么要做两次呢
  • 最好"尽快"过滤,而不是最终过滤IMHO(**(

**:请注意,这里的结果会有所不同!(但我认为我的"更好"(。例如,假设user_id 5 有3条记录

  • user_id=5,deleted=true,recent_activity_date=12月10日
  • user_id=5,deleted=false,recent_activity_date=12月8日
  • user_id=5,deleted=false,recent_activity_date=12月5日

忽略tokencreds表,user_id 5的结果在您的版本中为12月10日,而在我的版本中则为12月8日。检查您想要的要求!

编辑:建议索引中的错误

去掉unionexists(),并将它们组合成一个直的join:


SELECT x.userid
, GREATEST(x.recent_logged_in, x.recent_activity_date ) AS recent_activity_date
FROM (
SELECT u.id AS userid
, u.recent_logged_in
, MAX(t.recentaccessed) AS recent_activity_date
FROM users u
LEFT JOIN tokencreds AS t ON t.userid = u.id
WHERE NOT u.deleted  
AND (
u.recent_logged_in > now() - cast('10 days' AS INTERVAL)
OR t.recentaccessed > now() - cast('10 days' AS INTERVAL)
)
GROUP BY u.id
) x
ORDER by userid;  

相关内容

  • 没有找到相关文章

最新更新