左连接横向表示条件和



我有一个包含客户、产品和类别的购买数据集。

customer     product     category    sales_value
A     aerosol     air_care             10
B     aerosol     air_care             12
C     aerosol     air_care              7
A     perfume     air_care              8
A     perfume     air_care              2
D     perfume     air_care             11
C      burger         food             13
D       fries         food              6
C       fries         food              9

对于每种产品,我想要至少购买过一次该产品的客户在此产品上花费的销售价值与在此产品类别上花费的销售价值之间的比率。

另一种说法是:以至少购买过一次fries的客户为例,对于所有这些客户,计算 A) 花在fries上的销售价值总和和 B) 花在food上的销售价值总和。

中间表的形式如下:

product    category  sum_spent_on_product           sum_spent_on_category    ratio
by_people_buying_product
aerosol    air_care                    29                              39     0.74
perfume    air_care                    21                              31     0.68
burger        food                    13                              22     0.59
fries        food                    15                              28     0.53

示例:至少购买过一次aerosol的人,在此产品上总共花费了 1800。总的来说,这些人在air_care类别(aerosol所属)上花费了 3600。因此,aerosol的比率为 0.5。

我尝试使用left join lateral解决这个问题并计算每个product的给定中间结果,但我无法弄清楚如何包含条件only for customers who bought this specific product

select
distinct (product_id)
, category
, c.sales_category
from transactions t
left join lateral (
select
sum(sales_value) as sales_category
from transactions
where category = t.category
group by category
) c on true
;

上述查询列出了每种产品在产品类别上的花费总和,但没有所需的产品买家条件。

left join lateral是正确的方法吗?纯SQL中还有其他解决方案吗?

我会使用一个窗口函数来计算每个类别中每个客户的总支出:

SELECT
customer, product, category, sales_value,
sum(sales_value) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions;
customer | product | category | sales_value | tot_cat 
----------+---------+----------+-------------+---------
A        | aerosol | air_care |       10.00 |   20.00
A        | perfume | air_care |        8.00 |   20.00
A        | perfume | air_care |        2.00 |   20.00
B        | aerosol | air_care |       12.00 |   12.00
C        | aerosol | air_care |        7.00 |    7.00
C        | fries   | food     |        9.00 |   22.00
C        | burger  | food     |       13.00 |   22.00
D        | perfume | air_care |       11.00 |   11.00
D        | fries   | food     |        6.00 |    6.00

然后我们只需要总结一下。当客户多次购买相同的产品时,就会出现问题。在您的示例中,客户A购买了两次香水。为了克服这个问题,让我们同时按客户、产品和类别进行分组(并对sales_value列求和):

SELECT
customer, product, category, SUM(sales_value) AS sales_value,
SUM(SUM(sales_value)) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions
GROUP BY customer, product, category
customer | product | category | sales_value | tot_cat 
----------+---------+----------+-------------+---------
A        | aerosol | air_care |       10.00 |   20.00
A        | perfume | air_care |       10.00 |   20.00 <-- this row summarizes rows 2 and 3 of previous result
B        | aerosol | air_care |       12.00 |   12.00
C        | aerosol | air_care |        7.00 |    7.00
C        | burger  | food     |       13.00 |   22.00
C        | fries   | food     |        9.00 |   22.00
D        | perfume | air_care |       11.00 |   11.00
D        | fries   | food     |        6.00 |    6.00

现在我们只需要sales_value和tot_cat求和即可获得中间结果表。我使用一个公用表表达式来获取名称为t的先前结果:

WITH t AS (
SELECT
customer, product, category, SUM(sales_value) AS sales_value,
SUM(SUM(sales_value)) OVER (PARTITION BY customer, category) AS tot_cat
FROM transactions
GROUP BY customer, product, category
)
SELECT
product, category,
sum(sales_value) AS sales_value, sum(tot_cat) AS tot_cat,
sum(sales_value) / sum(tot_cat) AS ratio
FROM t
GROUP BY product, category;
product | category | sales_value | tot_cat |         ratio          
---------+----------+-------------+---------+------------------------
aerosol | air_care |       29.00 |   39.00 | 0.74358974358974358974
fries   | food     |       15.00 |   28.00 | 0.53571428571428571429
burger  | food     |       13.00 |   22.00 | 0.59090909090909090909
perfume | air_care |       21.00 |   31.00 | 0.67741935483870967742

对于每种产品,我希望至少购买过一次该产品的客户在此产品上花费的销售价值与在此产品类别上花费的销售价值之间的比率。

如果我理解正确,您可以按人员和类别汇总销售额,以获得该类别的总数。 在Postgres中,您可以保留一系列产品并将其用于匹配。 因此,查询如下所示:

select p.product, p.category,
sum(p.sales_value) as product_only_sales, 
sum(pp.sales_value) as comparable_sales
from purchases p join
(select customer, category, array_agg(distinct product) as products, sum(sales_value) as sales_value
from purchases p
group by customer, category
) pp
on p.customer = pp.customer and p.category = pp.category and p.product = any (pp.products)
group by p.product, p.category;

这是一个数据库<>小提琴。

编辑:

数据允许产品日期重复。 这就把事情搞砸了。 解决方案是为每个客户按产品进行预聚合:

select p.product, p.category, sum(p.sales_value) as product_only_sales, sum(pp.sales_value) as comparable_sales
from (select customer, category, product, sum(sales_value) as sales_value
from purchases p
group by customer, category, product
) p join
(select customer, category, array_agg(distinct product) as products, sum(sales_value) as sales_value
from purchases p
group by customer, category
) pp
on p.customer = pp.customer and p.category = pp.category and p.product = any (pp.products)
group by p.product, p.category

下面是此示例的 db<>fiddle。

最新更新