4小时内完成的平均骑行次数

我有一个数据集，每个骑行都有自己的骑行id和完成时间。我想知道平均每4小时有多少次骑行。

样本数据集：

dropoff_datetime    ride_id
2022-08-27 11:42:02 1715
2022-08-24 05:59:26 1713
2022-08-23 17:40:05 1716
2022-08-28 23:06:01 1715
2022-08-27 03:21:29 1714

例如，我想知道在2022-8-27下午12点到2022-8-274下午4点之间，那一次发生了多少次骑行？那么，从2022-8-27下午4点到2022-8-278下午8点，在这4小时内发生了多少次骑行？

我尝试过的：

我首先将dropoff_datetime截断为小时。(DATE_TRUNC(
然后我按那个小时分组，得到每小时的乘车次数

示例查询：注：调用上表为final。

SELECT  DATE_TRUNC('hour', dropoff_datetime) as by_hour 
,count(ride_id) as total_rides 
FROM final 
WHERE 1=1 
GROUP BY 1

结果：

by_hour total_rides
2022-08-27 4:00:00  3756
2022-08-27 5:00:00  6710

我的问题是：我怎样才能使它每4小时分组一次？

这个问题实际上由两部分组成——如何生成日期范围和如何计算数据。一种可能的方法是使用数据中的最小和最大日期来生成范围，然后再次与数据连接：

-- sample data
with dataset (dropoff_datetime, ride_id) AS
(VALUES (timestamp  '2022-08-24 11:42:02', 1715),
(timestamp  '2022-08-24 05:59:26', 1713),
(timestamp  '2022-08-24 05:29:26', 1712),
(timestamp  '2022-08-23 17:40:05', 1716)),
-- query part
min_max as (
select min(date_trunc('hour', dropoff_datetime)) d_min, max(date_trunc('hour', dropoff_datetime)) d_max
from dataset
),
date_ranges as (
select h
from min_max,
unnest (sequence(d_min, d_max, interval '4' hour)) t(h)
)
select h, count_if(ride_id is not null)
from date_ranges
left join dataset on dropoff_datetime between h and h + interval '4' hour
group by h
order by h;

这将产生下一个输出：

h	_col1
2022-08-23 17:00:00		1
2022-08-23 21:00:00		0
2022-08-24 01:00:00		0
2022-08-24 05:00:00	2
2022-08-24 09:00:00	1

相关内容

最新更新

热门标签：