我是Python新手。
我想找到配置文件来自日志文件的,具有以下条件
- 用户登录,用户更改密码,同一秒内用户注销
- 这些动作(登录、更改密码、注销)一个接一个地发生,中间没有其他动作。
与。txt文件看起来像这样
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged in| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user changed password| -
Mon, 22 Aug 2016 13:15:39 +0200|178.57.66.225|asdf| - |user logged off| -
Mon, 22 Aug 2016 13:15:42 +0200|178.57.66.225|iukj| - |user logged in| -
Mon, 22 Aug 2016 13:15:40 +0200|178.57.66.215|klij| - |user logged in| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user changed password| -
Mon, 22 Aug 2016 13:15:49 +0200|178.57.66.215|klij| - |user logged off| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged in| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user changed password| -
Mon, 22 Aug 2016 13:15:59 +0200|178.57.66.205|plnb| - |user logged off| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged in| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed password| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user changed profile| -
Mon, 22 Aug 2016 13:17:50 +0200|178.57.66.205|qweq| - |user logged off| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged in| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user changed password| -
Mon, 22 Aug 2016 13:19:19 +0200|178.56.66.225|zzad| - |user logged off| -
Mon, 22 Aug 2016 13:20:42 +0200|178.57.67.225|yytr| - |user logged in| -
asdf -是典型的配置文件日志文件
中的名称这是我到目前为止所做的
import collections
import time
with open('logfiles.txt') as infile:
counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
print(line, count)
time.sleep(10)
我知道逻辑是得到相同的小时、分钟和秒如果是重复的,那我就打印档案。但是我很困惑如何从文件中获取时间。
任何帮助都非常感谢。
编辑:
The output would be:
asdf
klij
plnb
zzad
我认为这比你想象的要复杂得多。您的示例数据非常简单,但是描述(需求)暗示日志可能有您需要考虑的分散行。因此,我认为这是一个通过日志文件顺序记录某些操作(登录,注销)并记录在任何前一行观察到的内容的情况。这似乎适用于您的数据:
from datetime import datetime as DT, timedelta as TD
FMT = '%a, %d %b %Y %H:%M:%S %z'
td = TD(seconds=1)
prev = None
with open('logfile.txt') as logfile:
for line in logfile:
if len(tokens := line.split('|')) > 4:
dt, _, profile, _, action, *_ = tokens
if prev is None or prev[1] != profile:
prev = (dt, profile) if action == 'user logged in' else None
else:
if action == 'user logged off':
if DT.strptime(dt, FMT) - DT.strptime(prev[0], FMT) <= td:
print(profile)
prev = None
输出:
asdf
plnb
qweq
zzad
要解析时间,我将在此任务中使用正则表达式来匹配每行的时间表达式。
这样就可以了。
编辑:我省略了不符合格式的行。
import re
time = re.search(r'(d+):(d+):(d+)', line).group()
就配置文件名称而言,我会在最常见的行上使用拆分函数,如@Matthias建议的,您的代码将看起来像这样:
import collections
import time
with open('logfiles.txt') as infile:
counts = collections.Counter(l.strip() for l in infile)
for line, count in counts.most_common():
"""The line splits where the '|' symbol is and creates a list.
We choose the third element of the list - profile"""
list_of_segments = line.split('|')
if len(list_of_segments) == 6:
print(list_of_segments[2])
time.sleep(10)