Bigquery聚合列数组数据



考虑一个BigQuery视图,其中有一个列ARRAY<STRUCT<col1 FLOAT, col2 FLOAT>>称为x

这是检索那些带有附加列的行的最佳方式,该列是对"x";数组中。有可能通过大查询存储过程来实现吗?

如果有这样的东西就太好了:

select computation(X), * from something

computation(X)将循环遍历X数组中的各个元素,并使用一些附加规则对它们求和。关于这一点,似乎BigQuery函数不处理结构/数组,而只处理标量类型,因此由于数据类型是ARRAY>,所以不能考虑X。循环仍然是需要的,所以我正在考虑使用一个过程,但我找不到任何例子,对可重复列进行操作,并给出查询本身的结果。

其中一个要点是保持这个查询在BigQuery查询控制台可用,避免像外部脚本(如python)这样的解决方案。

是否有这样的例子:过程内部有一个select,并返回额外的信息给它?就像查询上的映射函数。一种过滤器,类似于前面的函数计算(X)示例。

根据要求,为了给出更多的上下文,我有一个列是:

ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64, usd_amount FLOAT64, account_currency_amount FLOAT64>>

包含GCP价格等级。我必须遍历所有这些,然后计算最终的价格。FLOAT64是货币的可靠数据类型的占位符。我还在寻找BigQuery.

我将达到类似的东西,实现一个名为get_tiers_total_expense的函数


-- Ex. Those are 2 tiers.
-- Tier 1: It starts from 0 usages and goes until 20. The costs 10 for every unit used.
-- Tier 2: It starts from 20 usages and cost is 5
select get_tiers_total_expense(array(
select as struct 1.0, 0.0, 10.0, 9.0 union all
select as struct 1.0, 20.0, 5.0, 4.0 as tiered_rates));

您可以考虑如下。(我建议使用SQL UDF而不是JS UDF)

-- sample data
CREATE TEMP TABLE tiers AS
SELECT 1.0 pricing_unit_quantity, 0.0 start_usage_amount,
10.0 usd_amount, 9.0 account_currency_amount UNION ALL
SELECT 1.0, 20.0, 5.0, 4.0;
-- define UDFs here
CREATE TEMP FUNCTION get_tiers_total_expense (
tiers ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64,
usd_amount FLOAT64,
account_currency_amount FLOAT64>>
) AS ((
-- it takes tiers and return the final cost
-- -> you can adjust the aggregation logic
SELECT SUM(pricing_unit_quantity * usd_amount * account_currency_amount)
FROM UNNEST(tiers) tier
));
-- sample query here using UDFs and sample data
SELECT get_tiers_total_expense(
ARRAY_AGG(
STRUCT(
pricing_unit_quantity,
start_usage_amount,
usd_amount,
account_currency_amount
)
)
) AS get_tiers_total_expense
FROM tiers t
--GROUP BY -- later you can change group-by columns depending on your use cases.
-- query result
+-------------------------+
| get_tiers_total_expense |
+-------------------------+
|                   110.0 |
+-------------------------+

我最终按照@jaytiger的建议创建了一个BigQuery UDF。

我必须遍历所有的gcp层来计算最终的成本。我使用FLOAT作为货币的占位符。我当然需要弄清楚哪种数据类型更适合在BigQuery中存储它们…

这里是部分解:

CREATE OR REPLACE FUNCTION get_tiers_total_expense(
tiers ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64, usd_amount FLOAT64, account_currency_amount FLOAT64>>
)
RETURNS NUMBER
LANGUAGE js AS """
// it takes tiers and return the final cost
// Luckily float64 are converted to Number in JS and not float
// that is not great for currency computation
""";

-- Ex. Those are 2 tiers.
-- Tier 1: It starts from 0 usages and goes until 20. The costs 10 for every unit used.
-- Tier 2: It starts from 20 usages and cost is 5
select get_tiers_total_expense(array(
select as struct 1.0, 0.0, 10.0, 9.0 union all
select as struct 1.0, 20.0, 5.0, 4.0 as tiered_rates));

相关内容

  • 没有找到相关文章

最新更新