c-如何比较过于相似的文件



我有两个这样的文本文件:

行类似=>SITE.MACHINE.VARIABLE_NAME=VARIABLE_VALUE

CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC13.CHRONO_SANSREPONSE_KEEPALIVE=0
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=32099
...

他们已经在分拣-u

我必须找出一个文件或另一个文件中的哪些行或被修改了(我不在乎常见的行(,比如sdiff命令。但是这些文件的行太相似,导致了diff错误。

我在想"="左边的diff,如果可以的话,检查右边。我正在寻找一种打印类似sdiff或类似的输出的解决方案。

输出所需示例:

File1                                                         | File2
CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES="1:0:1:1:0:0:0:0:0"  | CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES="1:0:1:1:0:0:0:1:0"
CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=1              | CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE=1               | CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE=1             | CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE=0
CPM-NOMINAL.WAC12.PROTOCOLE_PDD=2                             | CPM-NOMINAL.WAC12.PROTOCOLE_PDD=3
> CPM-NOMINAL.WAC7.SQL_PROC_INIT_XAPDD_MBN_TEST="p_initialiser"
CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE=FALSE                   | CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE=TRUE
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=3201                    | DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=32099
DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT=3201                    | DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT=3204

谢谢。

使用join可以完成类似的操作

$ join -a1 -a2 -e"---" -t= -o1.1,1.2,2.2,2.1 file1 file2 | column -ts=
CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES         "1:0:1:1:0:0:0:0:0"             "1:0:1:1:0:0:0:1:0"  CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES
CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE   1                               0                    CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE
CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE    1                               0                    CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE
CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE  1                               0                    CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE
CPM-NOMINAL.WAC12.PROTOCOLE_PDD                  2                               3                    CPM-NOMINAL.WAC12.PROTOCOLE_PDD
---                                              ---                             "p_initialiser"      CPM-NOMINAL.WAC7.SQL_PROC_INIT_XAPDD_MBN_TEST
CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE            FALSE                           TRUE                 CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT            3201                            32099                DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT
DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT            3201                            3204                 DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT

通过管道连接到awk '$2!=$3'可以消除常见值

以下是使用传统工具和管道进行此操作的一种可能方法。我使用术语密钥和值,因为文件看起来像

key = value

以下命令列表为您提供了可能的答案:

# lines common between file1 and file2
grep -F -f file1 file2
# lines in file2 not in file1
grep -v -F -f file1 file2
# changed key values from file1 to file2
cut -d'=' -f1 file1 | grep -F -f - <(grep -v -F -f file1 file2)
# keys in file1 but not in file2
cut -d'=' -f1 file1 | grep -v -F -f - file2
# keys in file2 but not in file1
cut -d'=' -f1 file2 | grep -v -F -f - file1

或者你可以只使用一个简单的awk,这不是最优化的,但可以提供干净的输出:

$ awk '
BEGIN{FS=" *= *"}
{key=$1;value=$2}
(NR==FNR){a[key]=value; next}
{b[key] = value }
END {
for (key in a) if (key in b) {
print (a[key] == b[key] ? "COMM" : "DIFF"), key,"=",a[key],"<=>",b[key]
delete a[key]
delete b[key] 
}
for (key in a) {
print "UNI1", key,"=",a[key]
}
for (key in b) {
print "UNI2", key,"=",b[key]
}
}' file1 file2

这将产生一些看起来像的输出

COMM key1 = val1 <=> val1
COMM key2 = val2 <=> val2
DIFF key3 = val31 <=> val32      
COMM key4 = val4 <=> val4
UNI1 key5 = val5
UNI2 key6 = val6      

最新更新