拆分以'n'和'nn'结尾的线



如果这里不是问这样一个问题的合适地方,请原谅我,但我正努力想出一个可行的方法来分割一些文本。

以下是我试图拆分的文本示例:

[Thu Feb  2 12:45:38 2017][428423.3] (file_name:0xcb61) Invalid variable type
call stack:
-----------
[0cb61:+33] larray, r#26, fp(3),
[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)
[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()
[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()
[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)
[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)
[1e24a:main+9664] eop, -, -,
[Thu Feb  2 14:09:07 2017][428423.8] Warning: writing 0 byte file (/the_directory/) to tar archive
[Thu Feb  2 18:55:27 2017][449547.25] Warning: writing 0 byte file (/the_directory/) to tar archive
[Fri Feb  3 12:21:33 2017][451135.3] (file_name:0xcb61) Invalid variable type
call stack:
-----------
[0cb61:+33] larray, r#26, fp(3),
[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)
[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()
[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()
[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)
[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)
[1e24a:main+9664] eop, -, -,

正如您在上面看到的,上面的文本实际上不适合任何类型的模式,并且有些错误会抛出空白换行符,有些则不然。理想情况下,我最终想要的是这样的东西。。。

[[Thu Feb  2 14:09:07 2017][428423.8] Warning: writing 0 byte file (/the_directory/) to tar archive], [Thu Feb  2 12:45:38 2017][428423.3] (file_name:0xcb61) Invalid variable type ncall stack:n-----------n[0cb61:+33] larray, r#26, fp(3),n[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)n[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()n[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()n[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)n[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)n[1e24a:main+9664] eop, -, -,]

然后我可以通过循环访问每个错误。现在,我正在使用一些正则表达式来过滤已知的好数据,然后丢弃调用堆栈,但如果可能的话,我希望能够存储整个调用堆栈。

这是我当前的代码:

with open(local_dump, 'r') as ifile:
for line in ifile:
filename_pattern = re.compile(r'((w*.w*):w*)s(.*$)')
date_pattern = re.compile(r"^[([a-zA-z]{3,})s([a-zA-z]{3,})s{2}(d{1,2})s(d{1,2}:d{1,2}:d{1,2})s(d{4})][d*.d*]s(.*$)")
if re.search(date_pattern, line):
data = re.search(date_pattern, line)
if re.search(filename_pattern, (data[6])):
data = re.search(filename_pattern, (data[6]))
print("{0}: {1}".format(data.group(1),data.group(2)))
else:
if re.search("call stack", line.strip()):
print(line.strip())

我用这个代码块几乎可以做到这一点:

with open(local_dump, 'r') as ifile:
lines = ifile.read()
for line in lines.split('nn'):
print("LINE: " + line)

上面的代码确实将调用堆栈分解成了自己的行,但当行以"\n"结尾时,我遇到了问题:

LINE: [Thu Feb  2 12:45:38 2017][428423.3] (file_name:0xcb61) Invalid variable type
call stack:
-----------
[0cb61:+33] larray, r#26, fp(3),
[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)
[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()
[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()
[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)
[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)
[1e24a:main+9664] eop, -, -,
LINE: [Thu Feb  2 14:09:07 2017][428423.8] Warning: writing 0 byte file (/the_directory/) to tar archive
[Thu Feb  2 18:55:27 2017][449547.25] Warning: writing 0 byte file (/the_directory/) to tar archive
[Fri Feb  3 12:21:33 2017][451135.3] (file_name:0xcb61) Invalid variable type
call stack:
-----------
[0cb61:+33] larray, r#26, fp(3),
[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)
[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()
[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()
[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)
[14a5b:+4] refcall, fp(-2), string#3103, # from: fp(-2)
[1e24a:main+9664] eop, -, -,

以下是文本在更原始的格式中的外观:

'[Thu Feb  2 14:09:07 2017][428423.8] Warning: writing 0 byte file (/the_directory/) to tar archive n[Thu Feb  2 18:55:27 2017][449547.25] Warning: writing 0 byte file (/the_directory/) to tar archive n[Fri Feb  3 12:21:33 2017][451135.3] (file_name:0xcb61) Invalid variable type ncall stack:n-----------n[0cb61:+33] larray, r#26, fp(3), n[031ff:Mug::Request.preHandlers+17] refcall, fp(1), string#245, # from: fp(1)n[0339d:Mug::Request.process+77] call, addr(0x80001d), -, # Mug::Request.preHandlers()n[02ffd:Mug::Request.recv+93] call, addr(0x800026), -, # Mug::Request.process()n[02d03:Mug::Connection.on_client+101] refcall, fp(0), string#734, # from: fp(0)n[14a5b:+4] refcall, fp(-2), sting#3103, # from: fp(-2)n[1e24a:main+9664] eop, -, -, '

感谢您提供的任何提示、技巧和帮助。

您可以在n上进行拆分,然后删除空行。

input = "your input"
list = input.split("n")
list = filter(None, list)

如果你只想从日志中获得所有错误消息,你可以尝试:

matches = re.finditer(r"[.*?][.*]s*(.*)$", input, re.MULTILINE)
for match in matches:
print("Error: " + match.group(1))

假设所有错误前面都有两个[...]

最新更新