对某些日期字符串迭代 SQL 查询



我有一组 BigQuery 表,它们每天收集测试结果,每个表的名称都像various_tests.test_name_20190523.我有一个查询,我可以在指定的日期范围内运行该查询以查找失败次数和失败与所有测试的比率,但我更愿意获得一个包含多个日期范围的表格,每个日期范围都作为表中的一行,例如BETWEEN "20190901" AND "20190916", BETWEEN "20190916" AND "20191001", BETWEEN "20191001" AND "20191016".每行的列将与此处的单行结果相同。有什么好方法可以做到这一点?

SELECT
"20190916" as StartDate, "20191001" as EndDate, 
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE
_TABLE_SUFFIX BETWEEN "20190916" AND "20191001"

(实际查询在WHERECASE子句中还有几个条件,但为了清楚起见,省略了它们。

一种方法是使用脚本。下面的示例使用bigquery-public-data.google_analytics_sample.ga_sessions_*来演示这个想法。您可以轻松地将其适应您的情况。

也可以根据您的需要生成date_ranges

DECLARE date_ranges ARRAY<STRUCT<s STRING, e STRING>>
DEFAULT [
('20170801', '20170802'), 
('20170703', '20170704'),
('20170603', '20170604')
];
DECLARE index INT64 DEFAULT 0;
CREATE TEMP TABLE result(s STRING, e STRING, cnt INT64); 
LOOP
IF index = array_length(date_ranges) 
THEN BREAK;
END IF;
BEGIN
DECLARE date_start STRING DEFAULT date_ranges[OFFSET(index)].s;
DECLARE date_end STRING DEFAULT date_ranges[OFFSET(index)].e;
INSERT INTO result
SELECT date_start, date_end, count(*) cnt
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` 
WHERE _TABLE_SUFFIX BETWEEN date_start and date_end ;
SET index = index + 1;
END;
END LOOP;
SELECT * FROM result;

输出

+----------+----------+------+
|    s     |    e     | cnt  |
+----------+----------+------+
| 20170703 | 20170704 | 3984 |
| 20170603 | 20170604 | 2933 |
| 20170801 | 20170802 | 2556 |
+----------+----------+------+

成本

与手动更改 start_date/end_date 以多次运行的成本相同。

性能

不如单个查询(您必须手动烹饪(高效,因为必须单独运行多个 INSERT INTO 。

可扩展性

临时表仍受每个表的每日 DML 配额限制,因此单个脚本中只能执行 1000 次插入。

试试下面

SELECT
-- "20190916" as StartDate, "20191001" as EndDate, 
_TABLE_SUFFIX AS Day,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE
_TABLE_SUFFIX BETWEEN "20190916" AND "20191001"
GROUP BY DAY

创建多个日期范围并为每个日期范围运行查询的方法

SELECT
-- "20190916" as StartDate, "20191001" as EndDate, 
CASE 
WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN "20190916"
WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN "20180916"
END AS StartDate,
CASE 
WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN "20191001"
WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN "20181001"
END AS EndDate,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" 
OR _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" 
GROUP BY StartDate, EndDate  

。一种更简单的方法...

为避免重复所有条件 - 请尝试以下操作

SELECT
-- "20190916" as StartDate, "20191001" as EndDate, 
(CASE 
WHEN _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" THEN STRUCT("20190916" AS StartDate, "20191001" AS EndDate)
WHEN _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" THEN ("20180916", "20181001")
END).*,
SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end) as Bad, COUNT(*) as Total,
(SUM(CASE WHEN status IN ("BAD") AND foo = 'bar' then 1 else 0 end)/ COUNT(*)) as Ratio
FROM
`various_tests.test_name_*`
WHERE _TABLE_SUFFIX BETWEEN "20190916" AND "20191001" 
OR _TABLE_SUFFIX BETWEEN "20180916" AND "20181001" 
GROUP BY StartDate, EndDate  

最新更新