比较两个文件与awk并排并得到预期的输出



我有两个文件1.xml和2.xml。文件是排序的,长度不同。我想用awk来比较和打印匹配和不匹配的行。

xml1.

AGPS=1_<Class>_AGPS -> allowedAudit == false

AGPS=1_<Class>_AGPS -> allowedAudit == false

AGPS=1_<Class>_AGPS -> horizontalAccuracy == 100

AGPS=1_<Class>_AGPS -> horizontalAccuracy == 50

AGPS=1_<Class>_AGPS -> id == 1

AGPS=1_<Class>_AGPS -> id == 2

AGPS=1_<Class>_AGPS -> ionosphericModelAllowed == true

AGPS=1_<Class>_AGPS -> maxNumGPSSatellites == 8

AGPS=1_<Class>_AGPS -> maxNumGPSSatellites == 8

AGPS=1_<Class>_AGPS -> maxUeBasedAGPSProcedureTime == 24

xml2.

AGPS=1_<Class>_AGPS -> allowedAudit == false

AGPS=1_<Class>_AGPS -> allowedAudit == true

AGPS=1_<Class>_AGPS -> horizontalAccuracy == 120

AGPS=1_<Class>_AGPS -> horizontalAccuracy == 50

AGPS=1_<Class>_AGPS -> id == 1

AGPS=1_<Class>_AGPS -> id == 3

AGPS=1_<Class>_AGPS -> ionosphericModelAllowed == true

AGPS=1_<Class>_AGPS -> maxNumGPSSatellites == 8

使用代码

awk -F"==" 'FNR==NR { array1[$1]=$2;array[$2]=$2; next } { print ($2 in array ? $0 : $0" "array1[$1]" ""NM"), array[$2] }' 2.xml 1.xml

输出AGPS=1_<Class>_AGPS -> allowedAudit == false false

AGPS=1_<Class>_AGPS -> allowedAudit == false false

AGPS=1_<Class>_AGPS -> horizontalAccuracy == 100 50 NM

AGPS=1_<Class>_AGPS -> horizontalAccuracy == 50 50

AGPS=1_<Class>_AGPS -> id == 1 1

AGPS=1_<Class>_AGPS -> id == 2 3 NM

AGPS=1_<Class>_AGPS -> ionosphericModelAllowed == true true

AGPS=1_<Class>_AGPS -> maxNumGPSSatellites == 8 8

AGPS=1_<Class>_AGPS -> maxNumGPSSatellites == 8 8

AGPS=1_<Class>_AGPS -> maxUeBasedAGPSProcedureTime == 24 NM

预期输出

AGPS=1_<Class>_AGPS -> allowedAudit == false false

AGPS=1_<Class>_AGPS -> allowedAudit == false false

AGPS=1_<Class>_AGPS -> horizontalAccuracy == 100 120 NM

AGPS=1_<Class>_AGPS -> horizontalAccuracy == 50 50

AGPS=1_<Class>_AGPS -> id == 1 1

AGPS=1_<Class>_AGPS -> id == 2 3 NM

AGPS=1_<Class>_AGPS -> ionosphericModelAllowed == true true

AGPS=1_<Class>_AGPS -> maxNumGPSSatellites == 8 8

AGPS=1_<Class>_AGPS -> maxNumGPSSatellites == 8 NF

AGPS=1_<Class>_AGPS -> maxUeBasedAGPSProcedureTime == 24 NF

不匹配的行需要额外的代码作为Not Found(NF)

对于一些不匹配的情况(NM),逻辑也会失败,但对于某些情况,它可以工作。

实际的文件很大,我只获得了部分成功。

awk来救援!

$ awk -F' == ' 'NR==FNR {a[$1,++c[$1]]=$2; next} 
{print $1 FS $2, v=a[$1,++d[$1]], (v!=$2)?"NM":""; 
delete a[$1,d[$1]]} 
END     {for(k in a) 
{split(k,ks,SUBSEP); 
print ks[1] FS a[k],"NF"}}' file1 file2
AGPS=1_<Class>_AGPS -> allowedAudit == false false
AGPS=1_<Class>_AGPS -> allowedAudit == true false NM
AGPS=1_<Class>_AGPS -> horizontalAccuracy == 120 100 NM
AGPS=1_<Class>_AGPS -> horizontalAccuracy == 50 50
AGPS=1_<Class>_AGPS -> id == 1 1
AGPS=1_<Class>_AGPS -> id == 3 2 NM
AGPS=1_<Class>_AGPS -> ionosphericModelAllowed == true true
AGPS=1_<Class>_AGPS -> maxNumGPSSatellites == 8 8
AGPS=1_<Class>_AGPS -> maxNumGPSSatellites == 8 NF
AGPS=1_<Class>_AGPS -> maxUeBasedAGPSProcedureTime == 24 NF

通过添加计数器作为key的一部分来处理重复的key。匹配是通过协调计数器。最终,多余的记录被打印为NF。

p。记录2的预期输出是错误的。

最新更新