优化连接、求和、子查询



我正在为一个研究项目构建一个 Tinder 克隆,我试图在概念上做一些非常简单的事情,但似乎我的要求真的太重了。

数据结构

我创建了这个简单的小提琴来可视化数据库结构。
我试图将索引放在user.iduser.gender * user.orientationmatch.user1match.user2match.createdAt上,但没有运气。

预期成果

我想根据性别、方向、上次登录和日历日期找到匹配次数较少的人。
用户在 24 小时内不会参与超过 4 场比赛,因此我寻找在过去 24 小时内具有 <= 3 场比赛的用户。

下面的值是硬编码的,以便于编辑请求,因为我现在没有花时间做这部分。

匹配项由 2 个用户(用户 1 和用户 2)组成。
同一天 4 个匹配项的限制是它们显示为用户 1 和用户 2 时的总和。

SELECT total_sum, userId
FROM (
SELECT u.id as userId, u.orientation as userOrientation, u.gender as userGender, m1.sum1, m2.sum2, (m1.sum1 + m2.sum2) AS total_sum
FROM user u
INNER JOIN (
SELECT user1, COUNT(user1) as sum1 
FROM `match` 
WHERE createdAt > DATE('2017-12-11 00:00:00') 
GROUP BY user1
) m1
ON m1.user1 = u.id
INNER JOIN (
SELECT user2, COUNT(user1) as sum2 
FROM `match` 
WHERE createdAt > DATE('2017-12-11 00:00:00') 
GROUP BY user2
) m2
ON m2.user2 = u.id
WHERE u.gender IN ('female')
AND u.orientation IN ('hetero', 'bi')
AND u.lastLogin > 1512873464582
) as total
WHERE total_sum < 4
ORDER BY total_sum ASC
LIMIT 8

A. 问题

对于小表,请求需要几毫秒,但对于中等表(50k 用户,200k 匹配),请求需要很长时间(170 秒)。

优化

根据Kettner @Thorsten回应,这是我在设置他建议的索引后将其运行到我的测试数据库中时他的请求的解释计划:

溶液

我最终做了一些更容易的事情。
首先,我通过删除user2列来展平我的比赛表。它的大小翻了一番,因为现在 1 个匹配项变成了 2 行,但允许我使用适当的索引做一些非常简单和非常高效的事情。第一个查询用于
管理没有匹配项的用户,第二个查询用于处理具有匹配项的用户。我不再将 matchesLimit 放入查询中,因为它为 mysql 增加了额外的工作,我只需要检查第一个结果以查看 matchNumber 是否为 <= 3。

(SELECT u.id, mc.id as nb_match, u.gender, u.orientation
FROM user u 
LEFT JOIN match_composition mc 
ON (mc.matchedUser = u.id AND mc.createdAt > DATE('2017-12-11 00:00:00'))
WHERE u.lastLogin > 1512931740721 
AND u.orientation IN ('bi', 'hetero')
AND u.gender IN ('female')
AND mc.id IS NULL
ORDER BY u.lastLogin DESC)
UNION ALL 
(SELECT u.id, count(mc.id) as nb_match, u.gender, u.orientation
FROM match_composition mc
JOIN user u 
ON u.id = matchedUser
WHERE mc.createdAt > DATE('2017-12-11 00:00:00')
AND u.lastLogin > 1512931740721
AND u.orientation IN ('bi', 'hetero')
AND u.gender IN ('female')
GROUP BY matchedUser
ORDER BY nb_match ASC
LIMIT 8)

感谢您的帮助

可以将用户匹配为用户 1 或用户 2。我们可以使用UNION ALL为每个用户获取一条记录:

select user1 as userid from match union all select user2 as userid from match;

完整查询:

select
u.id as userid,
coalesce(um.total, 0) as total
from user u
left join
(
select userid, count(*) as total
from 
(
select user1 as userid from match where createdat > date '2017-12-11'
union all 
select user2 as userid from match where createdat > date '2017-12-11'
) m
group by userid
) um on um.userid = u.id
where u.gender IN ('female')
and u.orientation in ('hetero', 'bi')
and u.lastlogin > 1512873464582
and coalesce(um.total, 0) < 4
order by coalesce(um.total, 0);

为此,您将具有以下索引:

create index idx_m1 on match (createdat, user1);
create index idx_m2 on match (createdat, user2);
create index idx_u on user (lastlogin, gender, orientation, id);

我想你的SQL技能是对的。这就是我想出的:

SELECT u.id as userId, 
u.orientation as userOrientation, 
u.gender as userGender, 
count(m.user1) total_sum
FROM user u
LEFT JOIN `match` m on (u.id in (m.user1, m.user2) 
and m.createdAt > DATE('2017-12-11 00:00:00'))
WHERE u.gender IN ('female')
AND u.orientation IN ('hetero', 'bi')
AND u.lastLogin > 1512873464582
having count(m.user1) <=4
ORDER BY total_sum ASC
LIMIT 8;

编辑:涵盖没有匹配项的情况

尝试使用索引匹配表列 user1、user1以及您在筛选器中使用的用户表列(或列组合)(例如性别),看看什么会带来更好的性能。

根据您提供的内容,我会在以下位置创建索引:- match.user1-

match.user2
- match.createdAt
- user.id(唯一,可能是PK) - 用户最后登录

我也会尝试用 COUNT(*) 替换 COUNT(user1),但它可能不会产生很大的影响。

user.genderuser.direction上的索引可能毫无用处:索引的效率与其基础值的方差成正比。因此,在具有 2-3 个不同值的字段上建立索引的成本大于有用性。

至于DLL,请尝试以下操作。我试图在与match连接之前强制对user进行过滤,以防查询优化器无法正常工作(我对非 MS 数据库几乎没有经验)

SELECT total_sum, userId
FROM (SELECT u.id as userId, u.orientation as userOrientation, u.gender as userGender, m1.sum1, m2.sum2, (m1.sum1 + m2.sum2) AS total_sum
FROM (SELECT * FROM user 
WHERE gender = 'female'
AND orientation IN ('hetero', 'bi')
AND lastLogin > 1512873464582
) u
INNER JOIN (SELECT user1, COUNT(*) as sum1 
FROM `match` 
WHERE createdAt > DATE('2017-12-11 00:00:00') 
GROUP BY user1
) m1 ON m1.user1 = u.id
INNER JOIN (SELECT user2, COUNT(*) as sum2 
FROM `match` 
WHERE createdAt > DATE('2017-12-11 00:00:00') 
GROUP BY user2
) m2 ON m2.user2 = u.id
) as total
WHERE total_sum < 4
ORDER BY total_sum ASC
LIMIT 8

最新更新