我一直在尝试编写代码来处理我每天处理的各种日志文件。我试过用bash, perl和python编写,但到目前为止还没有那么好。
以下是日志的示例:
Table TRKGRP1: New table control.
TRKGRP1: 1000 tuples checked. Tuple checking still in progress...
Completed tuple checking.
SUMMARY: Tbl TRKGRP1: tuples checked 1297, passed 1297, failed 0.
Table TOLLTRKS: New table control.
Completed tuple checking.
SUMMARY: Tbl TOLLTRKS: tuples checked 3, passed 3, failed 1.
Table BRANDOPT: New table control.
Completed tuple checking.
SUMMARY: Tbl BRANDOPT: tuples checked 0, passed 0, failed 0.
Table C7UPTMR: New table control.
Completed tuple checking.
SUMMARY: Tbl C7UPTMR: tuples checked 4, passed 4, failed 3.
Table TOPSCOIN: New table control.
Completed tuple checking.
SUMMARY: Tbl TOPSCOIN: tuples checked 0, passed 0, failed 2.
我需要的是从"Table"到"failed 1/2/3"的文本部分,我只需要捕获以失败1,失败2和失败3结束的部分。不需要失败的0。请记住,这些日志有时更长或更短,并不总是3行。
下面是预期的输出:
<>之前新的表控件。完成元组检查。摘要:Tbl TOLLTRKS:元组检查3个,通过3个,失败1个。表C7UPTMR:新的表控件。完成元组检查。摘要:表C7UPTMR:元组检查4个,通过4个,失败3个。表TOPSCOIN:新的表控件。完成元组检查。摘要:表TOPSCOIN:元组检查0,通过0,失败2。之前如果你们能帮我一下,我将非常感激。
Python——这不是最有效的,但希望算法是清晰的,它可以工作:
text = '''
Table TRKGRP1: New table control.
TRKGRP1: 1000 tuples checked. Tuple checking still in progress...
Completed tuple checking.
SUMMARY: Tbl TRKGRP1: tuples checked 1297, passed 1297, failed 0.
Table TOLLTRKS: New table control.
Completed tuple checking.
SUMMARY: Tbl TOLLTRKS: tuples checked 3, passed 3, failed 1.
Table BRANDOPT: New table control.
Completed tuple checking.
SUMMARY: Tbl BRANDOPT: tuples checked 0, passed 0, failed 0.
Table C7UPTMR: New table control.
Completed tuple checking.
SUMMARY: Tbl C7UPTMR: tuples checked 4, passed 4, failed 3.
Table TOPSCOIN: New table control.
Completed tuple checking.
SUMMARY: Tbl TOPSCOIN: tuples checked 0, passed 0, failed 2.
'''
lines = text.split('n')
或者,从文件
with open('input.txt') as f:
lines = f.readlines()
f.close()
然后f = open("output.txt", 'w')
buf = []
show = False
for line in lines:
if line.startswith('Table'):
if show:
f.writelines(buf)
buf = []
show = True
buf.append(line)
if line.find('failed 0') >= 0:
show = False
if show:
f.writelines(buf)
f.close()
将文件分成行组,然后从组中提取所需的数据就变得很简单了。下面展示了如何将文件分成所需的组。
当你把整个文件放在一个变量中:
while ($file =~ /G ( S[^n]*n (?:(?:[^nS][^n]*)?n)* )/xg) {
process($1);
}
每次读取一行时:
my $buf;
while (<>) {
if (/^S/) {
process($buf) if length($buf);
$buf = '';
}
$buf .= $_;
}
process($buf) if length($buf);
process
是相当平凡的。
sub process {
for ($_[0]) {
print
if /^Table /
&& /, failed (d+).$/m
&& $1 > 0;
}
}