我正在构建一个表格,其中显示多个记录包含相同btc
的实例,但对于不同的customer_names
,还向我显示了每个客户的最低cost
实例。
这个查询有效 - 但它效率非常低,并且在 80,000 行表上运行需要一分钟以上 - 所以我觉得我一定做错了什么。
select btc,customer_name,min(cost) from table where table.btc in
(select btc from table group by 1 having count(distinct customer_name) > 1)
group by 1,2
这将输出如下表:
+---------+---------------+---------+
| btc | customer_name | cost |
+---------+---------------+---------+
| asd32 | Sony | 1.45863 |
| asd32 | Nintendo | 1.84839 |
| bf33940 | Sony | 2.49188 |
| bf33940 | Nintendo | 2.49188 |
| a43c3f | Sony | 2.84142 |
| a43c3f | Nintendo | 2.45 |
| a43c3f | Sega | 2.689 |
+---------+---------------+---------+
我想更进一步,不包括两个customer_name
字段cost
相同的任何结果,(因此 - 从上表中删除 btc bf33940
,因为索尼和任天堂的成本相同(
我还想知道是否有更有效的方式来做我正在做的事情。
表结构
+------------------+--------------+------+-----+---------+
| field | type | null | key | default |
+------------------+--------------+------+-----+---------+
| btc | varchar(100) | NO | MUL | NULL |
| mpn | varchar(100) | YES | | NULL |
| supplier | varchar(100) | YES | | NULL |
| invoice | varchar(100) | YES | | NULL |
| invoice_date | datetime | YES | | NULL |
| qtr | varchar(5) | YES | | NULL |
| qty | double(10,0) | YES | | NULL |
| resale | double(15,5) | YES | | NULL |
| ext_resale | double(15,5) | YES | | NULL |
| cost | double(15,5) | YES | | NULL |
| ext_cost | double(15,5) | YES | | NULL |
| gpp | double(15,5) | YES | | NULL |
| project | varchar(100) | YES | | NULL |
| team | double(15,5) | YES | | NULL |
| build_type | varchar(50) | YES | | NULL |
| customer_name | varchar(100) | YES | | NULL |
| customer_address | varchar(100) | YES | | NULL |
| customer_type | varchar(100) | YES | | NULL |
| customer_group | varchar(100) | YES | | NULL |
| sps | varchar(100) | YES | | NULL |
| fps | varchar(100) | YES | | NULL |
| gps | varchar(100) | YES | | NULL |
| hps | varchar(100) | YES | | NULL |
+------------------+--------------+------+-----+---------+
此处的示例 CSV 文件:https://ufile.io/os0as
尝试用join
替换你的where...in
,尽管很难说没有测试的效率会提高多少。
像这样:
select t1.btc, customer_name, min(cost)
from xxx t1
join (
select btc
from xxx
group by btc
having count(*) > 1
) t2 on t1.btc = t2.btc
group by t1.btc, t1.customer_name
对于第二个问题,您可以进一步按btc和成本进行分组以删除重复项:
select t3.btc, group_concat(t3.customer_name), min_cost
from (
select t1.btc, t1.customer_name, min(cost) as min_cost
from xxx t1
join (
select btc
from xxx
group by btc
having count(distinct customer_name) > 1
) t2 on t1.btc = t2.btc
) t3
group by t1.btc, t1.cost
同样,很难说这是否会在没有测试的情况下起作用,但希望你明白了。
为了使其更快,我会为每个 btc 创建一个单独的表,并列出有多少客户拥有它,因此您无需创建具有 count((> 1 的临时表。