是否有从两个文件的比较中提取行中唯一差异的功能

好吧，我有一个文件，我们称它为file1.txt。它有5000条线路，而我有file2.txt，它有2000000条线路。

我运行了以下命令：comm -23 <(sort file2.txt) <(sort file1.txt) > file3.txt

我现在有file3.txt，有1996000条线路。我想提取file1.txt包含的1000个唯一行。这怎么可能呢？

我尝试过：comm -23 <(sort file1.txt) <(sort file3.txt) > file4.txt，但没有成功。file4.txt未被过滤，它是file1.txt的副本

提前谢谢。

PS：我正在使用cygwin，所以一些功能可能会受到限制。非常感谢

使用awk获取file1的唯一行。首先是一些测试数据(注释不是数据的一部分(：

文件1:

1  # unique in file1 so this is what we want
2  # common in file1 and file2

文件2:

2  # common in file1 and file2
3  # unique in file2

awk：

$ awk '
NR==FNR {         # process file1
a[$0]         # hash all records
next
}                 # process file2 below this point
($0 in a) {       # if common entry found in hash
delete a[$0]  # delete it from the hash
}
END {             # in the end
for(i in a)   # loop all leftovers
print i   # and output them
}' file1 file2    # mind the order

输出：

由于执行问题，输出将不会按任何有意义的顺序排列。

表面上，问题是使用file3从file1中提取唯一的行。假设文件3只有文件2的唯一行，最后一个comm(文件1和文件3(将不会从文件1 中删除任何数据

考虑一下：

comm -23 <(sort -t: -u file1.txt) <(sort -t: -u file2.txt)

grep -F -x -f file1.txt file3.txt.

完全披露：-这个答案在这里找到。

相关内容

最新更新

热门标签：