从特定行之间的文件中提取零件

我想知道，如何从大数据文件中的特定范围中提取一些数据？有没有办法阅读以"流行语"开头和结尾的内容。

我想在*NODE和**之间阅读每一行

*NODE
13021145,       2637.6073002472617,       55.011929824413045,        206.0394346892517
13021146,       2637.6051226039867,        55.21115693303926,       206.05686503802065
13021147,        2634.226986419154,        54.98263035830583,        205.9520084547658
13021148,        2634.224808775879,       55.181857466932044,       205.96943880353476
**

*NODE之前和之后**有数千行...

我知道它应该看起来像这样：

a = []
with open('file.txt') as file:
for line in file:
if line.startswith('*NODE'):
# NOW THERE SHOULD FOLLOW SOMETHING LIKE:
#   Go to next line and "a.append" till there comes the "magical"
#   "**"

知道吗？我对python完全陌生。感谢您的帮助！我希望你明白我的意思。

你几乎做到了 - 唯一缺少的是，一旦你找到开头，你就会搜索序列结束，直到发生这种情况，将你迭代的每一行附加到你的列表中。即：

data = None  # a placeholder to store your lines
with open("file.txt", "r") as f:  # do not shadow the built-in `file`
for line in f:  # iterate over the lines
if data is None:  # we haven't found `NODE*` yet
if line[:5] == "NODE*":  # search for `NODE*` at the line beginning
data = []  # make `data` an empty list to begin collecting
elif line[:2] == "**":  # data initialized, we look for the sequence's end
break  # no need to iterate over the file anymore
else:  # data initialized but not at the end...
data.append(line)  # append the line to our data

现在data将包含NODE*和**之间的行列表，或者如果找不到序列，则None。

试试这个：

with open('file.txt') as file:
a = []
running = False  # avoid NameError when 'if' statement below isn't reached
for line in file:
if line.startswith('*NODE'):
running = True  # show that we are starting to add values
continue  # make sure we don't add '*NODE'
if line.startswith('**'):
running = False  # show that we're done adding values
continue  # make sure we don't add '**'
if running:  # only add the values if 'running' is True
a.extend([i.strip() for i in line.split(',')])

输出是一个包含以下内容的列表：(我用过print('n'.join(a))(

13021145 2637.6073002472617 55.011929824413045 206.0394346892517 13021146 2637.6051226039867 55.21115693303926 206.05686503802065 13021147 2634.226986419154 54.98263035830583 205.9520084547658 13021148 2634.224808775879 55.181857466932044 205.96943880353476

我们可以遍历行，直到没有任何剩余或我们已经到达块的末尾，例如

a = []
with open('file.txt') as file:
for line in file:
if line.startswith('*NODE'):
# collect block-related lines
while True:
try:
line = next(file)
except StopIteration:
# there is no lines left
break
if line.startswith('**'):
# we've reached the end of block
break
a.append(line)
# stop iterating over file
break

会给我们

print(a)
['13021145,       2637.6073002472617,       55.011929824413045,        206.0394346892517n',
'13021146,       2637.6051226039867,        55.21115693303926,       206.05686503802065n',
'13021147,        2634.226986419154,        54.98263035830583,        205.9520084547658n',
'13021148,        2634.224808775879,       55.181857466932044,       205.96943880353476n']

或者，我们可以编写帮助器谓词，例如

def not_a_block_start(line):
return not line.startswith('*NODE')

def not_a_block_end(line):
return not line.startswith('**')

然后使用itertools模块的光彩，例如

from itertools import (dropwhile,
takewhile)    
with open('file.txt') as file:
block_start = dropwhile(not_a_block_start, file)
# skip block start line
next(block_start)
a = list(takewhile(not_a_block_end, block_start))

这将为我们提供相同的价值a.

相关内容

最新更新

热门标签：