我想写一个小的python脚本来绘制一些.dat文件。为此,我需要先处理文件。.dat文件如下所示:
(Real64
(numDims 1)
(size 513)
(data
[ 90.0282291905089 90.94377050431068 92.31708247501335 93.38521400778211 94.60593575951782 95.67406729228657 97.04737926298925 97.96292057679104 ...]
)
)
我想删除文本部分和"普通"括号。我只需要介于 [.....] 之间的数据。
我尝试了这样的事情:
from Tkinter import Tk
from tkFileDialog import askopenfilename
# just a small GUI to get the file
Tk().withdraw()
filename = askopenfilename()
import numpy as np
with open(filename) as f:
temp = f.readlines(5) #this is the line in the .dat file
for i in range(len(temp)-1):
if type(temp[i]) == str:
del temp[i]
然而,这总是导致"出界指数"。帮助将不胜感激。
我只需要介于 [.....] 之间的数据
# treat the whole thing as a string
temp = '''(Real64
(numDims 1)
(size 513)
(data
[ 90.0282291905089 90.94377050431068 92.31708247501335 ]
)
)'''
# split() at open bracket; take everything right
# then split() at close bracket; take everything left
# strip() trailing / leading white space
number_string = temp.split('[')[1].split(']')[0].strip()
# convert to list of floats, because I expect you'll need to
number_list = [float(i) for i in number_string.split(' ')]
print number_string
print number_list
>>> 90.0282291905089 90.94377050431068 92.31708247501335
>>> [90.0282291905089, 90.94377050431068, 92.31708247501335]
print re.findall("[([0-9. ]+)]",f.read())
这称为regular expression
,它说找到我所有的东西,即两个方括号之间的数字句点和空格
[ # literal left bracket
( # capture the stuff in here
[0-9. ] # accept 0-9 and . and space
+ # at least one ... probably more
) # end capture group
] # literal close bracket
或者,您可以使用类似pyparsing的东西
inputdata = '''(Real64
(numDims 1)
(size 513)
(data
[ 90.0282291905089 90.94377050431068 92.31708247501335 93.38521400778211 94.60593575951782 95.67406729228657 97.04737926298925 97.96292057679104 ...]
)
)
'''
from pyparsing import OneOrMore, nestedExpr
data = OneOrMore(nestedExpr()).parseString(inputdata)
print "GOT:", data[0][-1][2:-1]