Python比较两个数组并保存到文本文件



我的目标是比较"all.txt"与"blacklist.txt"任何匹配项将被删除并保存到"all_cleaned.txt"。然而,我现在的代码非常慢。我必须为数百万张唱片这样做,遗憾的是,这还不够快。任何加快进度的建议都将不胜感激

import os
input_file = "all.txt"
blacklist_file = "blacklist.txt"
i = 0
with open(input_file, "r") as fp:
lines = fp.readlines()
new_lines = []
for line in lines:
line = line.strip().lower()
print(str(i) + ": " + line)
i = i + 1
if line not in new_lines:
new_lines.append(line)
output_file = "all_cleaned.txt"
print("Writing data ...")
with open(output_file, "a+") as fp:
fp.write("n".join(new_lines).lower())

可以使用set数据结构

input_file = "all.txt"
blacklist_file = "blacklist.txt"
output_file = "all_cleaned.txt"
# files are iterable. the elements are the lines of the file.
input_set = set(open(input_file))
blacklist_set = set(open(blacklist_file))
okay_set = input_set - blacklist_set
with open(output_file, "a") as fp: # a is append, using that since the question did
fp.writelines(okay_set)

如果要遍历数百万条记录,自然会花费一些时间。我建议使用多线程或多处理方法,查看每个线程或进程的一些数据块,并与黑名单进行比较。

file1 = open("all.tx.txt", "r+")
file2 = open("blacklist.txt", "r+")
f1 = file1.readlines()
f2 = file2.readlines()

def algo():
duplicate = []
unique = []
superunique = []
for item1 in f1:
if item1 in f2:
duplicate.append(item1)
else:
unique.append(item1)
superunique = set(unique) - set(duplicate)
print("Duplicate values: ", duplicate)
print("Unique Values: ", unique)
print("Super Unique Value :", superunique)
algo()

最新更新