如何在 Python 中使用分隔符将多行文本合并为一行,以用大文件 (4gb+) 分隔它们



我正在使用此代码来连接行,但它不适用于大文件。在不转储内存中的大文件的情况下连接的解决方案是什么?我需要添加一个分隔符"|",这个在行之间。此代码工作正常,但不适用于大尺寸文件。

current = None 
parts = [] 
with open('DEFIS.TXT', 'r', encoding="utf-8", errors="ignore") as f:
for line in f:
if line.startswith('D1000'):
current = [line.strip()]
parts.append(current)
elif current is not None:
current.append(line.strip())
with open('DEFIS-OUT.TXT', 'w') as f:
f.write('n'.join(('|'.join(part) for part in parts)))

您可以同时从输入文件读取和写入输出文件,例如:

current = []
with open('DEFIS.TXT', 'r') as f_in, open('DEFIS-OUT.TXT', 'w') as f_out:
for line in map(str.strip, f_in):
if line.startswith('D1000'):
if current:
print('|'.join(current), file=f_out)
current = []
current.append(line)
#save last chunk (if any):
if current:
print('|'.join(current), file=f_out)

如果DEFIS.TXT包含:

D1000
1
2
3
D1000
4
5
6
D1000
7
8
9

然后DEFIS-OUT.TXT运行脚本后将包含:

D1000|1|2|3
D1000|4|5|6
D1000|7|8|9

最新更新