规范化SQL查询中的数据



我有一个SQL查询A(有关更多详细信息,请参阅下文),它返回如下表:

cluster  brand  amount
0         bos     600
0         phi     300
0         har     100
1         pro    2500
1         wal    1500
1         ash    1000
2         dil    4200
2         sor     500
2         van     300
...

然而,我不想显示金额,而是显示该金额与该集群中总金额的比例,如下表所示:

cluster  brand  amount
0         bos    0.60
0         phi    0.30
0         har    0.10
1         pro    0.50
1         wal    0.30
1         ash    0.20
2         dil    0.84
2         sor    0.10
2         van    0.06
...

我应该如何更改SQL,以便能够访问一个集群中所有金额的总和,并且仍然具有同一集群中的多行?

**详细信息**

SQL服务器:MySQL,通过python-MySQL连接器接口。

生成第一个表的当前SQL查询:

SELECT c.cluster, brand, COUNT(o.id) AS brand_amount
FROM nyon_all.clustering AS c 
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id 
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid 
LEFT JOIN nyon_all.articles AS a ON o.aid = a.id 
LEFT JOIN nyon_all.brands AS ab ON a.brand_id = ab.id 
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35'
GROUP BY cluster, brand 
HAVING brand_amount > 100
ORDER BY c.cluster ASC, brand_amount DESC;

orders(主键id)将persons(外键pid)与articles(外键aid)链接。Articles具有某个品牌(外键brand_id),该品牌与表brands中的名称相关。

每个集群的文章总量可以通过以下SQL查询检索:

SELECT c.cluster, COUNT(o.pid) AS amount
FROM nyon_all.clustering AS c 
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id 
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid 
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35'
GROUP BY cluster
ORDER BY c.cluster ASC, amount DESC;

结果:

cluster amount
0        1000
1        5000
2        5000

但是,我似乎无法将这两个SQL查询组合起来。

您可以通过集群对一个子查询求和

select t1.cluster, amount / sumAmount 
from Table1 t1
join (select cluster, sum(amount) as sumAmount
      from Table1
      group by cluster)s
on t1.cluster = s.cluster

请参阅SqlFiddle

编辑

SELECT 
    c.cluster, 
    brand, 
    COUNT(o.id) / coalesce(s.sumBrandAmount, 0) AS brand_amount -- of course it would be nice to check for dividing by 0
FROM nyon_all.clustering AS c 
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id 
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid 
LEFT JOIN nyon_all.articles AS a ON o.aid = a.id 
LEFT JOIN nyon_all.brands AS ab ON a.brand_id = ab.id 
LEFT JOIN (select c1.id, count(o1.id) as sumBrandAmount
           from nyon_all.clustering c1
           left join nyon_all.persons p1 on p1.id = c1.pid
           left join nony_all.orders as o1 on o1.id = p1.id
           --maybe some where clause as in your main query
           group by c1.id) s
                               ON s.id = c.id
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35'
GROUP BY cluster, brand 
HAVING brand_amount > 100
ORDER BY c.cluster ASC, brand_amount DESC;

最新更新