在处理巨大文件(20GB+)时,如何在python中更快地进行文件解析和I/O



这是下面的基本示例代码:

def process(line):
data = line.split("-|-")
print(userpass)
try:
data1, data2 = data[2], data[3]
finalline = f"{data1} some text here {data2}n"
with open("parsed.txt", 'a', encoding="utf-8") as wf:
wf.write(finalline)
except:
pass
with open("file.txt", "r", encoding="utf-8") as f:
for line in f:
process(line)

这一切都很顺利。但是,有没有什么方法可以让它使用多线程或内核运行得更快??

或者在操作时能够以某种方式实现SSD的读写速度?任何帮助都将不胜感激!

函数调用在Python中会产生大量开销。不要在文件的每一行都调用函数,而是内联定义。此外,不要重复打开同一个输出文件;打开一次,然后保持打开状态。

with open("file.txt", "r", encoding="utf-8") as f, 
open("parsed.txt", "a", encoding="utf-8") as outh:
for line in f:
data = line.split("-|-")
try:
print(f"{data[2]} some text here {data[3]}", file=outh)
except Exception:
pass

最新更新