如何显示两个csv文件的列之间的差异



我有两个csv文件:

old file:
name    size_bytes
air unknown
data/air/monitor    
data/air/monitor/ambient-air-quality-oil-sands-region   
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region   
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02    
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/EN 
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/EN/datapackage.json    886
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/EN/digest.txt  186
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/EN/JOSM_AMS13_SpecHg_AB_2017-04-02_EN.pdf  9033
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/FR 
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/FR/datapackage.json    886
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/FR/digest.txt  186
...

new file:
name    size_bytes
data    0
data/air    0
data/air/monitor    0
data/air/monitor/ambient-air-quality-oil-sands-region   0
data/air/monitor/ambient-air-quality-oil-sands-region/96c679c3-709e-4a42-89c6-09f09f2b7ffe.xml  65589
data/air/monitor/ambient-air-quality-oil-sands-region/datapackage.json  13152367
data/air/monitor/ambient-air-quality-oil-sands-region/digest.txt    188
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region   0
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02    0
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/FR 0
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/FR/JOSM_AMS13_SpecHg_AB_2017-04-02_FR.pdf  9186
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-02/digest.txt 82
data/air/monitor/ambient-air-quality-oil-sands-region/ecosystem-sites-speciated-mercury-preliminary-data-oil-sands-region/2017-04-09    0
...

我想比较一下";旧文件";到";新文件";并获取任何丢失的名称(文件夹或文件路径(。

现在我有这个:

with open('old_file.csv', 'r') as old_file:
old = set(row.split(',')[0].strip().lower() for row in old_file)
with open('new_file.csv','r') as new_file, open('compare.csv', 'w') as compare_files:
for line in new_file:
if line.split(',')[0].strip().lower() not in old:
compare_files.write(line)

这会运行,但输出不正确,它会打印出两个文件中的ARE名称。这是输出:

data    0
data/air    0
data/air/monitor/deposition-oil-sands-region/the-monitored-ambient-concentration-and-estimated-atmospheric-deposition-of-trace-elements-at-four-monitoring-sites-in-the-canadian-athabasca-oil-sands-region 0
data/air/monitor/deposition-oil-sands-region/the-monitored-ambient-concentration-and-estimated-atmospheric-deposition-of-trace-elements-at-four-monitoring-sites-in-the-canadian-athabasca-oil-sands-region/ElementConcentrationPM25_OSM_AMS-sites_2016-2017.csv    736737
data/air/monitor/deposition-oil-sands-region/the-monitored-ambient-concentration-and-estimated-atmospheric-deposition-of-trace-elements-at-four-monitoring-sites-in-the-canadian-athabasca-oil-sands-region/ElementConcentrationPM25to10_OSM_AMS-sites_2016-2017.csv    227513
data/air/monitor/deposition-oil-sands-region/the-monitored-ambient-concentration-and-estimated-atmospheric-deposition-of-trace-elements-at-four-monitoring-sites-in-the-canadian-athabasca-oil-sands-region/ElementFlux_OSM_AMS-sites_2016-2017.csv 691252
data/air/monitor/deposition-oil-sands-region/the-monitored-ambient-concentration-and-estimated-atmospheric-deposition-of-trace-elements-at-four-monitoring-sites-in-the-canadian-athabasca-oil-sands-region/ffeae500-ea0c-493f-9b24-5efbd16411fd.xml    41399
data/air/monitor/monitoring-of-atmospheric-precipitation-chemistry/major-ions/AtmosphericPrecipitationChemistry-MajorIons-APQMP-AllSites-2019.csv   169109
data/air/monitor/monitoring-of-atmospheric-precipitation-chemistry/major-ions/AtmosphericPrecipitationChemistry-MajorIons-APQMP-AllSites-2020.csv   150205
data/air/monitor/monitoring-of-atmospheric-precipitation-chemistry/major-ions/AtmosphericPrecipitationChemistry-MajorIons-CAPMoN-AllSites-2017.csv  4343972
data/air/monitor/monitoring-of-atmospheric-precipitation-chemistry/major-ions/AtmosphericPrecipitationChemistry-MajorIons-CAPMoN-AllSites-2018.csv  3782783
data/air/monitor/monitoring-of-combined-atmospheric-gases-and-particles/major-ions-and-acidifying-gases 0
data/air/monitor/monitoring-of-combined-atmospheric-gases-and-particles/major-ions-and-acidifying-gases/AtmosphericCombinedGasesParticles-FilterPack-CAPMoN-AllSites-2012.csv   1826690
data/air/monitor/monitoring-of-combined-atmospheric-gases-and-particles/major-ions-and-acidifying-gases/AtmosphericCombinedGasesParticles-FilterPack-CAPMoN-AllSites-2013.csv   1890761
data/air/monitor/monitoring-of-combined-atmospheric-gases-and-particles/major-ions-and-acidifying-gases/AtmosphericCombinedGasesParticles-FilterPack-CAPMoN-AllSites-2014.csv   1946788
data/air/monitor/monitoring-of-combined-atmospheric-gases-and-particles/major-ions-and-acidifying-gases/AtmosphericCombinedGasesParticles-FilterPack-CAPMoN-AllSites-2015.csv   2186536
data/air/monitor/monitoring-of-combined-atmospheric-gases-and-particles/major-ions-and-acidifying-gases/AtmosphericCombinedGasesParticles-FilterPack-CAPMoN-AllSites-2016.csv   2434692
data/air/monitor/monitoring-of-combined-atmospheric-gases-and-particles/major-ions-and-acidifying-gases/AtmosphericCombinedGasesParticles-FilterPack-CAPMoN-AllSites-2017.csv   2150499
data/air/monitor/monitoring-of-combined-atmospheric-gases-and-particles/major-ions-and-acidifying-gases/AtmosphericCombinedGasesParticles-FilterPack-CAPMoN-AllSites-2018.csv   2136853
...

我的代码有问题吗?有更好的方法吗?也许用熊猫?

您的标签提到Pandas,但我看不到您在使用它。无论哪种方式,如果我理解您的问题,外部合并都应该满足您的要求:

old = pd.read_csv(path_to_old_file)
new = pd.read_csv(path_to_new_file)
df = pd.merge(old, new, on="name", how="outer")

你的帖子不太清楚你到底需要什么,我也不特别想仔细检查这些文件名的差异。根据我所能收集到的,你想要所有两个csv文件的唯一文件路径,对吧?目前还不清楚你想对另一个专栏做些什么,所以我就不去管他们了。

我推荐阅读这篇Stack Overflow的文章。

编辑

澄清后:

old = pd.read_csv(path_to_old_file)
new = pd.read_csv(path_to_new_file)
np.setdiff1d(old["name"], new["name"])

这将为您提供old数据帧的name列中的所有值,这些值new数据帧中不存在

相关内容

  • 没有找到相关文章

最新更新