我试图比较2个csv文件,然后把共同的条目在第三个csv写入文件。由于某种原因,它迭代csv_input中的row的整个循环,但csv_compare循环中的条目只迭代一次,并在最后一个条目上停止。我想比较每个行条目和每个条目条目
import csv
finalCSV = {}
with open('input.csv', newline='') as csvfile, open('compare.csv', newline='') as keyCSVFile, open('output.csv', 'w' ,newline='') as OutputCSV:
csv_input = csv.reader(csvfile)
csv_compare = csv.reader(keyCSVFile)
csv_output = csv.writer(OutputCSV)
csv_output.writerow(next(csv_input))
for row in csv_input:
for entry in csv_compare:
print(row[0] + ' ' + entry[0])
if row[0] == entry[0]:
csv_output.writerow(row)
break
print('wait...')
当您中断内部循环并开始外部循环的下一次迭代时,csv_compare
不会重置到起点。它会从你离开的地方继续。一旦你耗尽了迭代器,那就完了。
需要在外循环的每次迭代开始时重置迭代器,最简单的方法是在那里打开文件。
with open('input.csv', newline='') as csvfile, open('output.csv', 'w' ,newline='') as OutputCSV:
csv_input = csv.reader(csvfile)
csv_output = csv.writer(OutputCSV)
csv_output.writerow(next(csv_input))
for row in csv_input:
with open('compare.csv', newline='') as keyCSVFile:
csv_compare = csv.reader(keyCSVFile)
for entry in csv_compare:
if row[0] == entry[0]:
csv_output.writerow(row)
break
我建议将csv_compare
的第一列读取为列表或集合,然后仅使用单个for循环:
import csv
finalCSV = {}
with open("input.csv", newline="") as csvfile, open(
"compare.csv", newline=""
) as keyCSVFile, open("output.csv", "w", newline="") as OutputCSV:
csv_input = csv.reader(csvfile)
csv_compare = csv.reader(keyCSVFile)
csv_output = csv.writer(OutputCSV)
csv_output.writerow(next(csv_input))
compare = {entry[0] for entry in csv_compare} # <--- read csv_compare to a set
for row in csv_input:
if row[0] in compare: # <--- use `in` operator
csv_output.writerow(row)
您可以完全跳过内循环。当第一列与compare.csv
中的任何第一列值匹配时,从input.csv
中添加行。因此,将这些值放在一个集合中,以便于查找。
import csv
with open('compare.csv', newline='') as keyCSVFile:
key_set = {row[0] for row in csv.reader(keyCSVFile)}
with open('input.csv', newline='') as csvfile, open('output.csv', 'w' ,newline='') as OutputCSV:
csv_input = csv.reader(csvfile)
csv_output = csv.writer(OutputCSV)
csv_output.writerow(next(csv_input))
csv_output.writerows(row for row in csv_input if row[0] in key_set)
del key_set
print('wait...')