使用Python比较CSV并打印出源和目标的不同行



旧csv文件

第1列(列名:我不会比较列名。csv不按顺序(

AA101
BB101
CC101
DD101
EE101

新csv文件

第2列(列名:我不会比较列名。csv不按顺序(

AA101
CC101
BB101
DD102
EE102

预期结果文件:

Different:
Old
DD101 (it is not in the New file)
EE101 (it is not in the New file)
New
DD102 (it is not in the Old file)
DD101 (it is not in the Old file)

我引用了这篇文章并创建了以下代码将CSV匹配行与Python 进行比较

import csv
Source_filename = "E:PathSource1.csv"
Target_filename = "E:PathTarget1.csv"
output_filename = "E:results.csv"
# Load all the entries from Source into a set for quick lookup.
source_ids = set()
with open(Source_filename, 'r') as f:
big_ip = csv.reader(f)
for csv_row in big_ip:
source_ids.add(csv_row[0])
# print source_ids
with open(Source_filename, 'r') as input_file, open(output_filename, 'w') as output_file:
input_csv = csv.reader(input_file)
output_csv = csv.writer(output_file)
for csv_row in input_csv:
ip = csv_row[0]
status = "Present" if ip in source_ids else "Not Present"
output_csv.writerow([ip, status + " in Source.csv"])

输出的代码与源代码既相同又不同。我只需要从源和目标输出不同的输出

一个选项是使用Pandas。有很多方法可以做到这一点,这里有一种方法可以为您提供所有记录的完整列表;指示器";列设置为";两个";(如果记录出现在两个文件中(;仅left_only";(如果在旧文件中(,或";仅右"(如果在新文件中(。更多关于Pandas合并的信息请点击此处:

import pandas as pd
old = pd.read_csv('old_file.csv')
new = pd.read_csv('new_file.csv')
output = old.merge(
new,
left_on='old_column_name',
right_on='new_column_name',
how='outer',
indicator=True,
)
output.to_csv('output.csv')

您也可以在保存到csv:之前过滤指示器

output[output['_merge'] != 'both'].to_csv('output.csv')

使用Pandas和pd.merge:

>>> %cat Source1.csv
AA101
BB101
CC101
DD101
EE101
>>> %cat Target1.csv
AA101
CC101
BB101
DD102
EE102
# Python env: pip install pandas
# Anaconda env: conda install pandas
import pandas as pd
source = pd.read_csv('Source1.csv', names=['big_ip'], header=None)
target = pd.read_csv('Target1.csv', names=['big_ip'], header=None)
df = pd.merge(source, target, how='outer', indicator=True)
>>> df
big_ip      _merge
0  AA101        both  # <- present both in source and target
1  BB101        both
2  CC101        both
3  DD101   left_only  # <- present in source only (old)
4  EE101   left_only
5  DD102  right_only  # <- present in target only (new)
6  EE102  right_only

输出可以根据您的需要进行自定义。

最新更新