我在mySQL中有一个日志表,看起来像这样:
mysql> describe logtable;
+----------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+-------+
| id | int(10) unsigned | NO | | NULL | |
| logdate | datetime | NO | | NULL | |
| host | varchar(100) | NO | | NULL | |
| action | varchar(100) | NO | | NULL | |
| user | varchar(100) | NO | | NULL | |
| org | varchar(100) | NO | | NULL | |
| location | varchar(1000) | NO | | NULL | |
+----------+------------------+------+-----+---------+-------+
这是其内容的样本(并非所有字段):
+----------+---------------------+--------+----------------------+----------+
| id | logdate | action | user | location |
+----------+---------------------+--------+----------------------+----------+
| 13933768 | 2017-01-03 08:42:25 | login | user1@somewhere.com | place1 |
| 13934110 | 2017-01-03 08:58:38 | login | user2@somewhere.com | place2 |
| 13935532 | 2017-01-03 11:02:31 | logout | user1@somewhere.com | place1 |
| 13935622 | 2017-01-03 11:11:25 | logout | user2@somewhere.com | place2 |
| 13935772 | 2017-01-03 11:27:27 | login | user3@somewhere.com | place3 |
| 13935942 | 2017-01-03 11:52:16 | login | user4@somewhere.com | place4 |
| 13936217 | 2017-01-03 12:25:08 | logout | user3@somewhere.com | place3 |
| 13936293 | 2017-01-03 12:33:16 | logout | user4@somewhere.com | place4 |
| 13937676 | 2017-01-03 15:33:59 | login | user3@somewhere.com | place5 |
| 13937859 | 2017-01-03 15:51:53 | logout | user3@somewhere.com | place5 |
| 13942394 | 2017-01-04 08:31:26 | login | user5@somewhere.com | place2 |
| 13943946 | 2017-01-04 09:46:04 | login | user4@somewhere.com | place4 |
| 13944372 | 2017-01-04 10:17:25 | login | user4@somewhere.com | place6 |
| 13944373 | 2017-01-04 10:17:27 | login | user4@somewhere.com | place6 |
| 13944374 | 2017-01-04 10:17:29 | login | user4@somewhere.com | place6 |
| 13944375 | 2017-01-04 10:19:22 | login | user4@somewhere.com | place4 |
| 13944575 | 2017-01-04 10:36:48 | login | user4@somewhere.com | place6 |
| 13946830 | 2017-01-04 14:56:36 | login | user6@somewhere.com | place7 |
| 13947791 | 2017-01-04 16:41:26 | logout | user5@somewhere.com | place2 |
| 13947795 | 2017-01-04 16:41:59 | login | user4@somewhere.com | place4 |
| 13948181 | 2017-01-04 17:19:19 | logout | user4@somewhere.com | place7 |
| 13948200 | 2017-01-04 17:22:18 | logout | user4@somewhere.com | place4 |
| 13948201 | 2017-01-04 17:22:18 | logout | user4@somewhere.com | place6 |
| 13948824 | 2017-01-04 20:23:15 | login | user7@somewhere.com | place8 |
| 13948870 | 2017-01-04 20:44:42 | logout | user7@somewhere.com | place8 |
| 13949945 | 2017-01-05 02:26:35 | logout | user6@somewhere.com | place7 |
| 13951697 | 2017-01-05 08:49:37 | login | user8@somewhere.com | place6 |
| 13951863 | 2017-01-05 08:56:37 | login | user9@somewhere.com | place9 |
| 13951886 | 2017-01-05 08:57:06 | login | user10@somewhere.com | place9 |
+----------+---------------------+--------+----------------------+----------+
我想制作一张表,显示每个用户的累积时间登录到各个位置。从理论上讲,每个人的login
和logout
条目位置组合应成对,但是各种各样的事情,例如客户故障和网络奇怪的情况,有时意味着不匹配。在首先登录之前,也可以登录第二位,因此对彼此可能会重叠。此外,对可能不匹配以下方式:
-
login
事件没有匹配的logout
- 一个没有匹配的
login
的logout
事件(听起来很奇怪,但是该软件会记录像这样的登录 - 仅使用一个相应的
logout
进行多个
login
事件(重试)我已经在user
和location
上加入表格,并从login
次中减去logout
次。我还确保logout
事件的ID高于login
事件,因为它们总是在增加。但是由于可能有多个条目,因此我得到了所有组合。以这个简单的例子:
示例1
+----------+---------------------+--------+----------------------+----------+
| id | logdate | action | user | location |
+----------+---------------------+--------+----------------------+----------+
| 1 | 2017-01-03 08:42:25 | login | user1@somewhere.com | place1 |
| 2 | 2017-01-03 11:02:31 | logout | user1@somewhere.com | place1 |
| 3 | 2017-01-03 11:27:27 | login | user1@somewhere.com | place1 |
| 4 | 2017-01-03 12:25:08 | logout | user1@somewhere.com | place1 |
+----------+---------------------+--------+----------------------+----------+
我天真的方法:
select * from logtable as t1
join logtable as t2
on t1.user = t2.user and t1.location = t2.location
and t1.action = 'login' and t2.action = 'logout'
and t2.id > t1.id
给我3个结果:IDS 1→2、1→4和3→4。而且由于它可能也看起来像这样:
示例2
+----------+---------------------+--------+----------------------+----------+
| id | logdate | action | user | location |
+----------+---------------------+--------+----------------------+----------+
| 1 | 2017-01-03 08:42:25 | login | user1@somewhere.com | place1 |
| 2 | 2017-01-03 08:43:35 | login | user1@somewhere.com | place1 |
| 3 | 2017-01-03 08:44:45 | login | user1@somewhere.com | place1 |
| 4 | 2017-01-03 12:25:08 | logout | user1@somewhere.com | place1 |
+----------+---------------------+--------+----------------------+----------+
这是3次尝试之前的3次尝试,然后进行注销,我只能得到1个结果,而不是3个,所需的答案是ID 3和ID 4之间的区别。
我可以描述我所追求的内容,但不能将其转换为SQL,至少没有多个仅运行几个小时的嵌套选择(日志文件为8500行)。
- 对于每个匹配的
user
和location
,找到任何登录的最后一个,以及的 在登录后登录为的任何注销的 。 - 计算这两个事件之间的时差
- 忽略没有匹配注销的登录名,反之亦然
- 总和每个用户的累积时间
在上面的示例1中,应该给出2个"登录/注销"事件(IDS 1→2 = 2:20:06和3→4 = 0:57:41)汇总到1输出行= 3:17:47,例如2,它应该给出一个"事件"(IDS 3→4 = 3:40:23)汇总到1行。
这应该做技巧:
SELECT
MIN(logdate_logout) AS logdate_logout,logdate_login,user,location,id_login,id_logout
FROM (
SELECT
MAX(logdate_login) AS logdate_login, logdate_logout,user,location,id_login,id_logout
FROM (
SELECT
t1.id AS id_login,
t1.logdate AS logdate_login,
t1.user,
t1.location,
t2.id AS id_logout,
t2.logdate AS logdate_logout
FROM logtable AS t1
JOIN logtable AS t2
ON (t1.user = t2.user AND t1.location = t2.location AND t1.action = 'login' AND t2.action = 'logout' AND t2.id > t1.id)
ORDER BY t1.logdate,t2.logdate
) results_naive_approach
GROUP BY logdate_logout,user,location
) inner_query
GROUP BY logdate_login,user,location
这将确保您只有在匹配登录之前就具有登录名,以及在匹配登录之后的登录名