这是我到目前为止的查询
SELECT
user_id,
conversation_id,
LAG(end_time)
OVER (PARTITION BY user_id ORDER BY start_time) AS last_convo
(unix_timestamp(start_time) - lag(unix_timestamp(end_time)))/60
OVER (PARTITION BY user_id ORDER BY start_time) AS time_between_end_newstart
FROM mt_data
ORDER BY user_id
我有user_id, conversation_id, start_time, end_time为每个会话。我正在寻找的是每个客户(如果他们有多个)对话A结束和对话B开始之间的时间。获得每个客户的结果以及从每个客户的第一个convo开始计算的地方是我正在努力的。谢谢你提前给我的一些专业提示
|user_id|conversation_id|start_time |end_time |
|-------|---------------|-------------------|-------------------|
|A |123 |2021-09-21 00:22:50|2021-09-21 00:28:07|
|B |122 |2021-09-21 07:56:00|2021-09-21 09:54:36|
|B |111 |2021-09-21 09:55:33|2021-09-21 10:19:08|
Desired outcome
|user_id|conversation_id|start_time |end_time |time_between_end_newstart|
|-------|---------------|-------------------|-------------------|-------------------------
|A |123 |2021-09-21 00:22:50|2021-09-21 00:28:07|0
|B |122 |2021-09-21 07:56:00|2021-09-21 09:54:36|0
|B |111 |2021-09-21 09:55:33|2021-09-21 10:19:08|0:00:57
这是一个起点,删除错误并正确使用LAG
:
SELECT user_id
, conversation_id
, start_time
, end_time
, COALESCE(
( unix_timestamp(start_time)
- lag(unix_timestamp(end_time )) OVER (PARTITION BY user_id ORDER BY start_time)
), 0) AS time_between_end_newstart
FROM mt_data
ORDER BY user_id, start_time
;
This result isn't with hadoop, but with a database which has similar support.
+---------+-----------------+---------------------+---------------------+---------------------------+
| user_id | conversation_id | start_time | end_time | time_between_end_newstart |
+---------+-----------------+---------------------+---------------------+---------------------------+
| A | 123 | 2021-09-21 00:22:50 | 2021-09-21 00:28:07 | 0 |
| B | 122 | 2021-09-21 07:56:00 | 2021-09-21 09:54:36 | 0 |
| B | 111 | 2021-09-21 09:55:33 | 2021-09-21 10:19:08 | 57 |
+---------+-----------------+---------------------+---------------------+---------------------------+
我无法访问hadoop
进行测试,但如果上述方法对您有效,那么剩下的唯一步骤就是正确格式化时差。
也许像这样:
SELECT user_id
, conversation_id
, start_time
, end_time
, COALESCE(
from_unixtime(
( unix_timestamp(start_time)
- lag(unix_timestamp(end_time )) OVER (PARTITION BY user_id ORDER BY start_time)
)
, 'HH:mm:ss'
)
, 0
) AS time_between_end_newstart
FROM mt_data
ORDER BY user_id, start_time
;
对不起,我也不能测试这个。
以下是我用作测试用例的设置(使用不同的数据库),作为示例:
CREATE TABLE mt_data (
user_id varchar(10)
, conversation_id int
, start_time datetime
, end_time datetime
);
INSERT INTO mt_data VALUES
('A' ,123 ,'2021-09-21 00:22:50','2021-09-21 00:28:07')
, ('B' ,122 ,'2021-09-21 07:56:00','2021-09-21 09:54:36')
, ('B' ,111 ,'2021-09-21 09:55:33','2021-09-21 10:19:08')
;