用于编辑日志文件的脚本/regex



我一直在尝试编写代码来处理我每天处理的各种日志文件。我试过用bash, perl和python编写,但到目前为止还没有那么好。

以下是日志的示例:

Table TRKGRP1: New table control.
      TRKGRP1: 1000 tuples checked. Tuple checking still in progress...
      Completed tuple checking.
      SUMMARY: Tbl TRKGRP1: tuples checked 1297, passed 1297, failed 0.
Table TOLLTRKS: New table control.
      Completed tuple checking.
      SUMMARY: Tbl TOLLTRKS: tuples checked 3, passed 3, failed 1.
Table BRANDOPT: New table control.
      Completed tuple checking.
      SUMMARY: Tbl BRANDOPT: tuples checked 0, passed 0, failed 0.
Table C7UPTMR: New table control.
      Completed tuple checking.
      SUMMARY: Tbl C7UPTMR: tuples checked 4, passed 4, failed 3.
Table TOPSCOIN: New table control.
      Completed tuple checking.
      SUMMARY: Tbl TOPSCOIN: tuples checked 0, passed 0, failed 2.

我需要的是从"Table"到"failed 1/2/3"的文本部分,我只需要捕获以失败1,失败2和失败3结束的部分。不需要失败的0。请记住,这些日志有时更长或更短,并不总是3行。

下面是预期的输出:

<>之前新的表控件。完成元组检查。摘要:Tbl TOLLTRKS:元组检查3个,通过3个,失败1个。表C7UPTMR:新的表控件。完成元组检查。摘要:表C7UPTMR:元组检查4个,通过4个,失败3个。表TOPSCOIN:新的表控件。完成元组检查。摘要:表TOPSCOIN:元组检查0,通过0,失败2。之前

如果你们能帮我一下,我将非常感激。

Python——这不是最有效的,但希望算法是清晰的,它可以工作:

text = '''
Table TRKGRP1: New table control.
      TRKGRP1: 1000 tuples checked. Tuple checking still in progress...
      Completed tuple checking.
      SUMMARY: Tbl TRKGRP1: tuples checked 1297, passed 1297, failed 0.
Table TOLLTRKS: New table control.
      Completed tuple checking.
      SUMMARY: Tbl TOLLTRKS: tuples checked 3, passed 3, failed 1.
Table BRANDOPT: New table control.
      Completed tuple checking.
      SUMMARY: Tbl BRANDOPT: tuples checked 0, passed 0, failed 0.
Table C7UPTMR: New table control.
      Completed tuple checking.
      SUMMARY: Tbl C7UPTMR: tuples checked 4, passed 4, failed 3.
Table TOPSCOIN: New table control.
      Completed tuple checking.
      SUMMARY: Tbl TOPSCOIN: tuples checked 0, passed 0, failed 2.
'''
lines = text.split('n')

或者,从文件

with open('input.txt') as f:
    lines = f.readlines()
f.close()
然后

f = open("output.txt", 'w')
buf = []
show = False
for line in lines:
    if line.startswith('Table'):
        if show:
            f.writelines(buf)
        buf = []
        show = True
    buf.append(line)
    if line.find('failed 0') >= 0:
        show = False
if show:
    f.writelines(buf)
f.close()

将文件分成行组,然后从组中提取所需的数据就变得很简单了。下面展示了如何将文件分成所需的组。

当你把整个文件放在一个变量中:

while ($file =~ /G ( S[^n]*n (?:(?:[^nS][^n]*)?n)* )/xg) {
   process($1);
}

每次读取一行时:

my $buf;
while (<>) {
   if (/^S/) {
      process($buf) if length($buf);
      $buf = '';
   }
   $buf .= $_;
}
process($buf) if length($buf);

process是相当平凡的。

sub process {
   for ($_[0]) {
      print
         if /^Table /
         && /, failed (d+).$/m
         && $1 > 0;
   }
}

最新更新