Python - 在文本文件中搜索特定时间范围(相当于 sed -n)



我正在尝试制作一个从日志文件输出特定时间范围的 python 脚本(类似于下面列出的 sed 命令):

sed -n '/2017-01-26 18:00/ , /2017-01-26 18:02/p' /logfile.log
2017-01-26 18:00:002017-01-26 18:01:012017-01-26 18:01:022017-01-26 18:01:092017-01-26 18:01:092017-01-26 18:01:112017-01-26 18:02:01

我的python脚本正在搜索一个固定的字符串,不像上面的sed命令(我怀疑我做错了什么,但我找不到错误 - 请检查下面的代码):

请指出我应该在哪里更改代码,并建议代码增强。谢谢!

#!/usr/bin/python
import datetime, time, os, sys, re
from datetime import timedelta
counter = 0
avgtime = 0
now = datetime.datetime.utcnow()
pasttime = now - datetime.timedelta(minutes=5)
timestamp = now.strftime("%y%m%d")
fiveago   = now - timedelta(minutes=5,seconds=now.second)
current   = now.strftime("%Y-%m-%d %H:%M")
pasttime  = fiveago.strftime("%Y-%m-%d %H:%M")
pattern   = str(current + "|" + pasttime)
f = open('/logs/' + sys.argv[1] + '/' + 'u_ex' + timestamp + '.log', 'r')
for line in f:
        if "POST" in line:
                if re.search(pattern, line, re.IGNORECASE):
                        date = line.split(' ')[1]
                        time = line.split(' ')[14]
                        avgtime += int(time)
                        counter += 1
                        print(date,time)
f.close()
print(pattern)
print("Total amount of time: ",counter)
print("Total scan time: ",avgtime)
print("Average scan time: ",avgtime / counter)

IIUC,您需要从您传递的时间戳之间的日志中输入。

import datetime, time, os, sys, re
from datetime import timedelta
counter = 0
avgtime = 0
now = datetime.datetime.utcnow()
pasttime = now - datetime.timedelta(minutes=100000)
timestamp = now.strftime("%y%m%d")
fiveago   = now - timedelta(minutes=5,seconds=now.second)
current   = now.strftime("%Y-%m-%d %H:%M")
pasttime  = fiveago.strftime("%Y-%m-%d %H:%M")
pattern   = str(current + "|" + pasttime)
print "Start time: ", pasttime ,"End time: ",current ,"nn"
filename ='/logs/' + sys.argv[1] + '/' + 'u_ex' + timestamp + '.log'
with open(filename, 'r') as f:
    contents = f.readlines()
for line in contents:
    if "POST" in line:
        date = line.split(' ')[1]
        time = line.split(' ')[14]
        logdatetime=date+" "+time
        if logdatetime <= current and logdatetime >= pasttime:
            print "yes, within the interval : " ,logdatetime

输出

Start time:  2017-01-26 20:23 End time:  2017-01-26 20:28 

yes, within the interval :  2017-01-26 20:23:20
yes, within the interval :  2017-01-26 20:23:01
yes, within the interval :  2017-01-26 20:23:02

用于此的输入

POST 2017-01-26 20:23:20 XX
POST 2017-01-26 20:23:01 XC
POST 2017-01-26 20:23:02 CV
POST 2017-01-26 20:20:09 DAF
POST 2017-01-26 20:20:09 fASF
POST 2017-01-26 20:20:11 Sfas
POST 2017-01-26 20:20:01 fsAf
POST 2017-01-26 20:20:02 asf
POST 2017-01-26 20:20:03 asf

我不明白问题是什么,但你要求的是你的命令的 sed 等效项,所以这里是对 python 的精确翻译:

import sys, re
use = False
for line in open('/logfile.log'):
   if re.search('2017-01-26 18:00', line): use = True
   if use: sys.stdout.write(line)
   if re.search('2017-01-26 18:02', line): use = False

您的解决方案的问题在于您只寻找两个"边缘时间"。在您的 3 分钟时间范围示例中,这是18:0018:02

sed命令的作用是:

sed -n '/2017-01-26 18:00/ , /2017-01-26 18:02/p' /logfile.log
  1. 在不打印的情况下循环访问行 ( -n
  2. 每当 sed 找到2017-01-26 18:00它就会开始打印所有行
  3. 每当 sed 找到2017-01-26 18:02它就会停止打印

在您的示例中,您的正则表达式模式为:

2017-01-26 18:00|2017-01-26 18:02

并且只会找到 18:00 18:02。因此,您可以做的是以下之一:

    解析行外的
  1. 日期并与时间范围进行比较,如Shijos答案
  2. 模拟 sed,如 theamks 答案,但请注意:这仅在文件中同时存在两个"边缘时间戳"时才有效
  3. 皮条客你的正则表达式,这样它也会搜索介于两者之间的时间:

    pattern = "|".join([(now-timedelta(minutes=i)).strftime("%Y-%m-%d %H:%M") for i in range(6)])
    

    这将产生例如:

    '2016-01-26 18:00|2016-01-26 17:59|2016-01-26 17:58|2016-01-26 17:57|2016-01-26 17:56|2016-01-26 17:55'
    

最新更新