我如何处理多行同时从一个文件读取?



使用open()函数创建'文件对象'f处理文件的标准Python方法允许您使用f.read()立即将整个文件加载到内存中,或者使用for循环逐个读取行:

with open('filename') as f:
# 1) Read all lines at once into memory:
all_data = f.read()
# 2) Read lines one-by-one:
for line in f:
# Work with each line

我正在搜索几个大文件,寻找可能跨越多行的模式。要做到这一点,最直观的方法是逐行读取,寻找模式的开始,然后加载接下来的几行,看看它在哪里结束:

with open('large_file') as f:
# Read lines one-by-one:
for line in f:
if line.startswith("beginning"):
# Load in the next line, i.e.
nextline = f.getline(line+1)  # ??? #
# or something

我用# ??? #标记的行是我自己的伪代码,我想象它应该看起来像。

我的问题是,这在Python中存在吗?是否有任何方法可以让我根据需要访问其他行,同时保持游标在line并且不将整个文件加载到内存中?

编辑从这里的回答和其他阅读推断,答案是"不"。

像这样:

gather = []
for line in f:
if gather:
gather.append(line)
if "ending" in line:
process( ''.join(gather) )
gather = []       
elif line.startswith("beginning"):
gather = [line]

尽管在许多情况下,直接将整个文件加载到字符串中并进行搜索更容易。

您可能需要在附加行之前将换行符rstrip

只需将感兴趣的行存储到列表中,同时逐行浏览文件:

with open("file.txt","w") as f:
f.write("""
a
b
------    
c
d
e
####
g
f""")
interesting_data = []
inside = False
with open ("file.txt") as f:
for line in f:
line = line.strip()
# start of interesting stuff
if line.startswith("---"):
inside = True
# end of interesting stuff
elif line.startswith("###"):
inside = False
# adding interesting bits
elif inside:
interesting_data.append(line)
print(interesting_data)            

,

['c', 'd', 'e']

我想你正在寻找.readline(),它正是这样做的。下面是一个草图,从模式开始的那一行开始。

with open('large_file') as f:
line = f.readline()
while not line.startswith("beginning"):
line = f.readline()

# end of file
if not line:
print("EOF")
break

# do_something with line, get additional lines by 
# calling .readline() again, etc.

最新更新