考虑一个BigQuery视图,其中有一个列ARRAY<STRUCT<col1 FLOAT, col2 FLOAT>>
称为x
这是检索那些带有附加列的行的最佳方式,该列是对"x";数组中。有可能通过大查询存储过程来实现吗?
如果有这样的东西就太好了:
select computation(X), * from something
computation(X)
将循环遍历X数组中的各个元素,并使用一些附加规则对它们求和。关于这一点,似乎BigQuery函数不处理结构/数组,而只处理标量类型,因此由于数据类型是ARRAY
其中一个要点是保持这个查询在BigQuery查询控制台可用,避免像外部脚本(如python)这样的解决方案。
是否有这样的例子:过程内部有一个select,并返回额外的信息给它?就像查询上的映射函数。一种过滤器,类似于前面的函数计算(X)示例。
根据要求,为了给出更多的上下文,我有一个列是:
ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64, usd_amount FLOAT64, account_currency_amount FLOAT64>>
包含GCP价格等级。我必须遍历所有这些,然后计算最终的价格。FLOAT64
是货币的可靠数据类型的占位符。我还在寻找BigQuery.
我将达到类似的东西,实现一个名为get_tiers_total_expense
的函数
-- Ex. Those are 2 tiers.
-- Tier 1: It starts from 0 usages and goes until 20. The costs 10 for every unit used.
-- Tier 2: It starts from 20 usages and cost is 5
select get_tiers_total_expense(array(
select as struct 1.0, 0.0, 10.0, 9.0 union all
select as struct 1.0, 20.0, 5.0, 4.0 as tiered_rates));
您可以考虑如下。(我建议使用SQL UDF而不是JS UDF)
-- sample data
CREATE TEMP TABLE tiers AS
SELECT 1.0 pricing_unit_quantity, 0.0 start_usage_amount,
10.0 usd_amount, 9.0 account_currency_amount UNION ALL
SELECT 1.0, 20.0, 5.0, 4.0;
-- define UDFs here
CREATE TEMP FUNCTION get_tiers_total_expense (
tiers ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64,
usd_amount FLOAT64,
account_currency_amount FLOAT64>>
) AS ((
-- it takes tiers and return the final cost
-- -> you can adjust the aggregation logic
SELECT SUM(pricing_unit_quantity * usd_amount * account_currency_amount)
FROM UNNEST(tiers) tier
));
-- sample query here using UDFs and sample data
SELECT get_tiers_total_expense(
ARRAY_AGG(
STRUCT(
pricing_unit_quantity,
start_usage_amount,
usd_amount,
account_currency_amount
)
)
) AS get_tiers_total_expense
FROM tiers t
--GROUP BY -- later you can change group-by columns depending on your use cases.
-- query result
+-------------------------+
| get_tiers_total_expense |
+-------------------------+
| 110.0 |
+-------------------------+
我最终按照@jaytiger的建议创建了一个BigQuery UDF。
我必须遍历所有的gcp层来计算最终的成本。我使用FLOAT
作为货币的占位符。我当然需要弄清楚哪种数据类型更适合在BigQuery中存储它们…
这里是部分解:
CREATE OR REPLACE FUNCTION get_tiers_total_expense(
tiers ARRAY<STRUCT<pricing_unit_quantity FLOAT64,
start_usage_amount FLOAT64, usd_amount FLOAT64, account_currency_amount FLOAT64>>
)
RETURNS NUMBER
LANGUAGE js AS """
// it takes tiers and return the final cost
// Luckily float64 are converted to Number in JS and not float
// that is not great for currency computation
""";
-- Ex. Those are 2 tiers.
-- Tier 1: It starts from 0 usages and goes until 20. The costs 10 for every unit used.
-- Tier 2: It starts from 20 usages and cost is 5
select get_tiers_total_expense(array(
select as struct 1.0, 0.0, 10.0, 9.0 union all
select as struct 1.0, 20.0, 5.0, 4.0 as tiered_rates));