我有两个csv文件file1.csv
col1,col2,col3
1,2,3
4,5,6
7,8,9
file2.csv
col1,col2,col3
0,2,3
4,0,6
7,8,9
我想比较这两个文件列,将结果输出到另一个文件。file3.csv
col1,col2
1,0
0,5
0,0
我试过的代码,
import csv
with open('file1.csv', 'r') as t1:
old_csv = t1.readlines()
with open('file2.csv', 'r') as t2:
new_csv = t2.readlines()
with open('file3.csv', 'w') as out_file:
line_in_new = 1
line_in_old = 1
leng=len(new_csv)
out_file.write(new_csv[0])
while line_in_new < len(new_csv) and line_in_old < len(old_csv):
if (old_csv[line_in_old]) != (new_csv[line_in_new]):
out_file.write(new_csv[line_in_new])
else:
line_in_old += 1
line_in_new += 1
这是stackoverflow中一个答案的一个小改动版本。如何实现逐列比较?
提前感谢!
如果你已经读过你的台词,你可以这样做:
for i in range(min(len(old_csv), len(new_csv))):
for new_value,old_value in zip(new_csv[i].split(","), old_csv[i].split(",")): # you can add slicing here ([start:stop]) to only select certain columns
# write whatever you want to the new file e.g.:
new_file.write(str(int(new_value) - int(old_value)))
我希望这能回答你的问题。
如果两个CSV文件中的列数和行数相同,您可以使用pandas快速获取差异。
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
diff = df1 - df2
diff.to_csv('file3.csv', index=False)
file3.csv
内容如下:
col1,col2,col3
1,0,0
0,5,0
0,0,0
James的答案是正确的,应该可以解决你的问题。如果您想避免像ID_col、string_cols这样的列,您可以尝试下面的代码。cols是要计算差值的列列表
import pandas as pd
cols = ['req_col1','req_col2','req_col3']
df3 = pd.DataFrame(cols )
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
for col in cols:
df3[col] = df1[col] - df2[col]
df3.to_csv('filepath.csv')