使用 python 解析特定文件

>我有一个 300 meg 的文件，其中包含这样的行

[0] ppt: (non-cluster) 4294967295 1 1.9.1.25
timestamp: 1355333137
states: 680 [138(average 2752 0)][139(average 2802 0)][2253(average 2008 0)][2484(average 2321 0)][2578(average 2792 0)][2615(average 3518 0)]
[1] ppt: (non-cluster) 4294967295 1 1.9.1.26
timestamp: 1355333137
states: 676 [138(average 2761 0)][139(average 2777 0)][2253(average 2075 0)][2484(average 2318 0)][2578(average 2792 0)][2615(average 3522 0)]

我将不胜感激有关如何使用Python解析文件的建议，生成字典列表，例如

1.9.1.25 ( 138: 2752, 139: 2802, 2253: 2008, 2484: 2321, 2578: 2792, 2615: 3518)
1.9.1.26 ( 138: 2761, 139: 2777, 2253: 2075, 2482: 2318, 2578: 2793, 2615: 3522)

并将列表存储在文件中。

感谢

这不是很优雅，但在这里你去：

import re
start_ln = re.compile(r'[d+] ppt: (.*?) d+ d+ (?P<ivar>d+.d+.d+.d+)')
tstamp_ln = re.compile(r'timestamp: d+')
state_ln = re.compile(r'states: (?P<pcount>d+) (?P<ggroup>([d+(average d+ d+)])+)')
group_p = re.compile(r'[(?P<st>d+)(average (?P<avg>d+) d+)]')
f = open('pfile', 'r')
state = 'WAIT'
llist = []
ldict = {}
cvar = None
for ln in f:
    if state == 'WAIT':
        mtch = start_ln.match(ln)
        if mtch is not None:
            cvar = mtch.groupdict()['ivar']
            ldict = {}
            state = 'LINE#1'
            continue
    elif state == 'LINE#1':
        mtch = tstamp_ln.match(ln)
        if mtch is not None:
            ldict.update(mtch.groupdict())
            state = 'LINE#2'
            continue
    elif state == 'LINE#2':
        mtch = state_ln.match(ln)
        if mtch is not None:
            groupline = mtch.groupdict()['ggroup']
            mtch2 = group_p.findall(groupline)
            ldict[cvar] = dict(mtch2)
            cvar = None
            state = 'WAIT'
        llist.append(ldict)

for i in llist:
    print i

完全没有错误检查 - "状态表示法"有点肤浅，但它应该可以解决问题。

相关内容

最新更新

热门标签：