在 BigQuery 中相对于另一个数组列对一个数组列进行排序



>我在 Bigquery 中有下表 -

WITH results AS
(SELECT 1 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.1,0.4,0.3,0.2] as probability
UNION ALL
SELECT 2 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.2,0.1,0.6,0.1] as probability
UNION ALL
SELECT 3 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.5,0.05,0.35,0.1] as probability
)
select * from results

在这里,每个顾客都有一定的概率购买水果。我想为每位客户及其相应的购买probabilities挑选top 2水果。

如果输出类似于这样的东西就好了——

customerid, fruits, probability
1, bananas, 0.4
1, grapes, 0.3
..

在上面的最终结果中,对于customerid 1我只拿起bananasgrapes,因为这 2 种水果的购买概率最高(从[0.1,0.4,0.3,0.2](

我可以在BiqQuery中使用任何函数来实现这一点吗?

下面是 BigQuery Standard SQL

#standardSQL
WITH results AS (
SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
SELECT customerid, ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) top
FROM results, 
UNNEST(probability) probability WITH OFFSET off1
JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
ON off1 = off2
GROUP BY customerid
), UNNEST(top)  

有结果

Row customerid  fruit   probability  
1   1           bananas 0.4  
2   1           grapes  0.3  
3   2           grapes  0.6  
4   2           apples  0.2  
5   3           apples  0.5  
6   3           grapes  0.35     

或者可能是稍微更好的选择

#standardSQL
WITH results AS (
SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
)
SELECT customerid, fruit, probability
FROM (
SELECT customerid, 
(
SELECT ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) 
FROM   UNNEST(probability) probability WITH OFFSET off1
JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
ON off1 = off2
) top
FROM results
), UNNEST(top)

具有相同的结果

最新更新