在BigQuery中,我们试图运行:
SELECT day, AVG(value)/(1024*1024) FROM (
SELECT value, UTC_USEC_TO_DAY(timestamp) as day,
PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank
FROM [Datastore.PerformanceDatum]
WHERE type = "MemoryPerf"
) WHERE rank >= 0.9 AND rank <= 0.91
GROUP BY day
ORDER BY day desc;
其返回相对少量的数据。但我们得到的信息是:
Error: Resources exceeded during query execution. The query contained a GROUP BY operator, consider using GROUP EACH BY instead. For more details, please see https://developers.google.com/bigquery/docs/query-reference#groupby
是什么导致这个查询失败,子查询的大小?我们是否可以做一些等效的查询来避免这个问题?
编辑以响应注释:如果我添加GROUP EACH BY(并删除外部ORDER BY),则查询失败,声称GROUP EACH BY在这里不可并行。
我写了一个对我有效的等价查询:
SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, UTC_USEC_TO_DAY(dtimestamp) as day,
PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank
FROM [io_sensor_data.moscone_io13]
WHERE sensortype = "humidity"
) WHERE rank >= 0.9 AND rank <= 0.91
GROUP BY day
ORDER BY day desc;
如果我只运行内部查询,我会得到3660624个结果。你的数据集比这个大吗?
当按天分组时,外部选择只给我4个结果。我会尝试不同的分组,看看我是否能达到极限:
SELECT day, AVG(value)/(1024*1024) FROM (
SELECT data value, dtimestamp / 1000 as day,
PERCENTILE_RANK() OVER (PARTITION BY day ORDER BY value ASC) as rank
FROM [io_sensor_data.moscone_io13]
WHERE sensortype = "humidity"
) WHERE rank >= 0.9 AND rank <= 0.91
GROUP BY day
ORDER BY day desc;
现在有57862个不同的小组参加。
我尝试了不同的组合以得到相同的错误。我得到的错误与您将初始数据量增加一倍的错误相同。一个简单的"破解"使数据量翻倍的方法正在改变:
FROM [io_sensor_data.moscone_io13]
收件人:
FROM [io_sensor_data.moscone_io13], [io_sensor_data.moscone_io13]
然后我得到了同样的错误。你有多少数据?你能应用一个额外的过滤器吗?由于您已经按天对percentile_rank进行了分区,您是否可以添加一个额外的查询来只分析一小部分天数(例如,仅上个月)?