GoogleBigQuery SQL:使滚动平均子查询或联接对大型数据集更有效

首先，我已经找到了如何使用子查询或联接来获得所需内容。我一直面临的问题，因为我缺乏GBQ的经验：

GBQ不允许"相关子查询"

由于我正在查询的数据量(+5000行(，联接似乎花费了大量时间(+3小时(，我猜查询可能效率低下

基本上，对于每一行，我都在计算一些行的平均值，其中这些行应该满足(current_row_value-x<=other_row_vvalue<current_row_value-1(的条件

使用https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_join作为数据，以及以下查询：

SELECT *, (select avg(Quantity) from OrderDetails as table_2 where table_2.OrderId between table_1.OrderId-3 and table_1.OrderId-1) as avg_quant_3 FROM OrderDetails as table_1 order by OrderId asc

它输出我需要的结果：

Number of Records: 518
OrderDetailID   OrderID ProductID   Quantity    avg_quant_3
1   10248   11  12  null
2   10248   42  10  null
3   10248   72  5   null
4   10249   14  9   27
5   10249   51  40  27
6   10250   41  10  76
7   10250   51  35  76
8   10250   65  15  76
9   10251   22  6   136
10  10251   57  15  136
11  10251   65  20  136
12  10252   20  40  150

我不能使用上面的查询格式，因为GBQ不接受相关的子查询。因此，这里是联接版本，其中的结果有点不同(省略了无法计算平均值的行(，但仍然正确。我还添加了"分组依据"中的所有列，因为GBQ不会接受使用聚合函数的查询，而不分组或聚合所选的所有列：

SELECT table_1.OrderDetailID, table_1.OrderID,table_1.ProductID, table_1.Quantity, sum(table_2.quantity) FROM OrderDetails as table_1
join OrderDetails as table_2 on table_2.OrderId between table_1.OrderId-3 and table_1.OrderId-1  
group by table_1.OrderDetailID, table_1.OrderID,table_1.ProductID, table_1.Quantity

Number of Records: 515
OrderDetailID   OrderID ProductID   Quantity    sum(table_2.quantity)
4   10249   14  9   27
5   10249   51  40  27
6   10250   41  10  76
7   10250   51  35  76
8   10250   65  15  76
9   10251   22  6   136
10  10251   57  15  136
11  10251   65  20  136
12  10252   20  40  150

这里的问题是，加入需要+3个小时，但实际上由于时间过长而失败。根据我迄今为止使用GBQ的经验，加入似乎需要很长时间，但我还是在查询一个大型数据集。我想知道是否有其他方法可以通过更有效的查询获得这些信息，我希望在未来能学到一些东西来提高GBQ的效率。还尝试在5米行上运行查询的联接版本，它需要+1小时，所以我也预计会失败。

您能显示在尝试运行"相关子查询"时出现的错误吗？以下查询适用于我：

create temp table table1
as select 1 as x, 2 as y
union all select 3, 4;
create temp table table2
as select 3 x;
select *, (select avg(y) from table1 where table1.x = table2.x)
from table2
order by x;

您似乎想要求和。使用窗口功能：

select t.*,
sum(quantity) over (order by orderid
range between 3 preceding and 1 preceding
) as TheThingYouCallAnAverage
from t;

相关内容

最新更新

热门标签：