计算每个客户会话A结束和会话B开始的时间差



这是我到目前为止的查询

SELECT  
user_id,
conversation_id,
LAG(end_time)
OVER (PARTITION BY user_id ORDER BY start_time) AS last_convo
(unix_timestamp(start_time) - lag(unix_timestamp(end_time)))/60
OVER (PARTITION BY user_id ORDER BY start_time) AS time_between_end_newstart
FROM    mt_data
ORDER BY user_id

我有user_id, conversation_id, start_time, end_time为每个会话。我正在寻找的是每个客户(如果他们有多个)对话A结束和对话B开始之间的时间。获得每个客户的结果以及从每个客户的第一个convo开始计算的地方是我正在努力的。谢谢你提前给我的一些专业提示

|user_id|conversation_id|start_time         |end_time           |
|-------|---------------|-------------------|-------------------|
|A      |123            |2021-09-21 00:22:50|2021-09-21 00:28:07|
|B      |122            |2021-09-21 07:56:00|2021-09-21 09:54:36|
|B      |111            |2021-09-21 09:55:33|2021-09-21 10:19:08|

Desired outcome
|user_id|conversation_id|start_time         |end_time           |time_between_end_newstart|
|-------|---------------|-------------------|-------------------|-------------------------
|A      |123            |2021-09-21 00:22:50|2021-09-21 00:28:07|0
|B      |122            |2021-09-21 07:56:00|2021-09-21 09:54:36|0
|B      |111            |2021-09-21 09:55:33|2021-09-21 10:19:08|0:00:57

这是一个起点,删除错误并正确使用LAG:

SELECT user_id
, conversation_id
, start_time
, end_time
, COALESCE(
(      unix_timestamp(start_time)
- lag(unix_timestamp(end_time  )) OVER (PARTITION BY user_id ORDER BY start_time)
), 0)  AS time_between_end_newstart
FROM mt_data
ORDER BY user_id, start_time
;
This result isn't with hadoop, but with a database which has similar support.
+---------+-----------------+---------------------+---------------------+---------------------------+
| user_id | conversation_id | start_time          | end_time            | time_between_end_newstart |
+---------+-----------------+---------------------+---------------------+---------------------------+
| A       |             123 | 2021-09-21 00:22:50 | 2021-09-21 00:28:07 |                         0 |
| B       |             122 | 2021-09-21 07:56:00 | 2021-09-21 09:54:36 |                         0 |
| B       |             111 | 2021-09-21 09:55:33 | 2021-09-21 10:19:08 |                        57 |
+---------+-----------------+---------------------+---------------------+---------------------------+

我无法访问hadoop进行测试,但如果上述方法对您有效,那么剩下的唯一步骤就是正确格式化时差。

也许像这样:

SELECT user_id
, conversation_id
, start_time
, end_time
, COALESCE(
from_unixtime(
(      unix_timestamp(start_time)
- lag(unix_timestamp(end_time  )) OVER (PARTITION BY user_id ORDER BY start_time)
)
, 'HH:mm:ss'
)
, 0
)  AS time_between_end_newstart
FROM mt_data
ORDER BY user_id, start_time
;

对不起,我也不能测试这个。

以下是我用作测试用例的设置(使用不同的数据库),作为示例:

CREATE TABLE mt_data (
user_id          varchar(10)
, conversation_id  int
, start_time       datetime
, end_time         datetime
);
INSERT INTO mt_data VALUES
('A'      ,123            ,'2021-09-21 00:22:50','2021-09-21 00:28:07')
, ('B'      ,122            ,'2021-09-21 07:56:00','2021-09-21 09:54:36')
, ('B'      ,111            ,'2021-09-21 09:55:33','2021-09-21 10:19:08')
;