只保留只出现一次的行

假设我有一个文本文件test.txt，它有

我想将输入文件中只出现一次的所有行写入一个新文件，并删除所有其他行：

001010
000011
111111

这段代码并没有达到我想要的效果，因为它只将重复的行减少到一行，但并没有完全删除它们：

lines_seen = set()
with open("leadsNoDupes.txt", "w+") as output_file:
for each_line in open("leads.txt", "r"):
if each_line not in lines_seen:
output_file.write(each_line)
lines_seen.add(each_line)

我该如何正确地执行此操作？

您需要维护一个计数：

from collections import Counter

with open("leadsNoDupes.txt", "w+") as output_file:
lines = list(open("leads.txt", "r"))
counts = Counter(lines)
for line in lines:
if counts[line] == 1:
output_file.write(line)

这种过度收集信息的行为，因为我们并不需要知道一条线是否发生了2、3或7次，但它仍然是线性的。

相关内容

最新更新

热门标签：