使用python查找两个csv文件列之间的差异



我有两个csv文件file1.csv

col1,col2,col3
1,2,3
4,5,6
7,8,9

file2.csv

col1,col2,col3
0,2,3
4,0,6
7,8,9

我想比较这两个文件列,将结果输出到另一个文件。file3.csv

col1,col2
1,0
0,5
0,0

我试过的代码,

import csv
with open('file1.csv', 'r') as t1:
old_csv = t1.readlines()
with open('file2.csv', 'r') as t2:
new_csv = t2.readlines()
with open('file3.csv', 'w') as out_file:
line_in_new = 1
line_in_old = 1
leng=len(new_csv)
out_file.write(new_csv[0])
while line_in_new < len(new_csv) and line_in_old < len(old_csv):
if (old_csv[line_in_old]) != (new_csv[line_in_new]):
out_file.write(new_csv[line_in_new])
else:
line_in_old += 1
line_in_new += 1

这是stackoverflow中一个答案的一个小改动版本。如何实现逐列比较?

提前感谢!

如果你已经读过你的台词,你可以这样做:

for i in range(min(len(old_csv), len(new_csv))):
for new_value,old_value in zip(new_csv[i].split(","), old_csv[i].split(",")): # you can add slicing here ([start:stop]) to only select certain columns
# write whatever you want to the new file e.g.:
new_file.write(str(int(new_value) - int(old_value)))

我希望这能回答你的问题。

如果两个CSV文件中的列数和行数相同,您可以使用pandas快速获取差异。

import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
diff = df1 - df2
diff.to_csv('file3.csv', index=False)

file3.csv内容如下:

col1,col2,col3
1,0,0
0,5,0
0,0,0

James的答案是正确的,应该可以解决你的问题。如果您想避免像ID_col、string_cols这样的列,您可以尝试下面的代码。cols是要计算差值的列列表

import pandas as pd 
cols = ['req_col1','req_col2','req_col3']
df3 = pd.DataFrame(cols )
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
for col in cols:
df3[col] =  df1[col] -  df2[col]
df3.to_csv('filepath.csv')

相关内容

  • 没有找到相关文章

最新更新