MySQL查询优化拥有数百万条记录



目标:需要查询来统计当前公司之外当前公司中不存在的所有"不同"潜在客户。查询需要考虑多个表(lead_tails、域、公司)之间的数百万条记录

EXAMPLE: 
  company 1 -> domain 1 -> lead 1 lead_details records exists.
  company 2 -> domain 2 -> lead 1 lead_details records exists.  
  company 2 -> domain 2 -> lead 2 lead_details records exists. 
  company 3 -> domain 3 -> lead 2 lead_details records exists. 
  company 3 -> domain 3 -> lead 3 lead_details records exists.

结果:如果我对1号公司的上述数据进行查询,结果应该是(2),因为线索2&线索3是唯一的,在公司1 中不存在

domain_id domain_name company_id company_name lead_id lead_count
    "2"         "D2"        "2"       "C2"        "2"      "2"
    "3"         "D3"        "3"       "C3"        "3"      "1"

这是我的问题,如果有人有更好的建议,请告诉我。

SELECT al.*
FROM (
    SELECT 
    d.id AS domain_id, 
    d.name AS domain_name, 
    c.id AS company_id, 
    c.name AS company_name, 
    ld.lead_id, 
    count(ld.lead_id) as lead_count 
    FROM domains d 
    INNER JOIN company c
    ON (c.id = d.company_id AND c.id != 1)
    INNER JOIN lead_details ld 
    ON (ld.domain_id = d.id)
    GROUP BY ld.lead_id
) al 
LEFT JOIN (
    SELECT ld.lead_id FROM domains d 
    INNER JOIN company c
    ON (c.id = d.company_id AND c.id = 1)
    INNER JOIN lead_details ld 
    ON (ld.domain_id = d.id)
) ccl
ON al.lead_id = ccl.lead_id 
WHERE ccl.lead_id IS NULL;

我有将近一百万行,所以需要找出更好的解决方案。。

计划A

图案

FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...

效率低下,尤其是在旧版本的MySQL中。这是因为这两个子查询都没有任何索引,所以(在旧版本中)需要对其中一个子查询进行重复的全表扫描。

更好的方法是尝试重新表述为

FROM t1 ...
JOIN t2 ... ON ...
JOIN t3 ... ON ...
LEFT JOIN t4 ... ON ...
LEFT JOIN t5 ... ON ...

计划B

这更接近于你所拥有的。。。

CREATE TEMPORARY TABLE ccl
        ( INDEX(lead_id) )
    SELECT ... -- the stuff that is after LEFT JOIN

然后将该子查询仅替换为ccl。这提供了原始查询中缺少的索引。

计划C

汇总表。(这对您的查询可能实用,也可能不实用,因为您正在查找不同的和不存在的。)每个月(或每周或其他时间)计算上个月的小计,并将其存储到另一个表中。然后,对另一个表的查询会快得多。

最新更新