使用awk比较2个文件并打印匹配行和非匹配行



我有2个CSV文件匹配和非匹配字段。
我想比较第二,第三和第四列,并在此基础上输出匹配(M),不匹配(NM)和未找到(NF with NULL)列。

a)如果col. 2,3和4完全匹配,则它是匹配的。
b)如果col. 2和3匹配但不匹配,那么它应该不匹配。
c)如果col. 2或col. 3本身不匹配,则应为not found case。

1. csv

SL_2344,personal_details,name,Andrew  
SL_2344,personal_details,address,G-101 SSR New-Delhi  
SL_2344,personal_details,Age,22Y  
SL_2344,personal_details,sex,M  
SL_2344,personal_details,height,5.8 ft  
SL_2344,education,Roll_number,22345  
SL_2344,education,stream,ScienceandMaths  
SL_2344,class,section,3D

2. csv

SL_12332,personal_details,name,Samantha  
SL_12332,personal_details,address,Park Street Mumbai  
SL_12332,personal_details,Age,22Y  
SL_12332,personal_details,sex,F  
SL_12332,height,5.8 ft  
SL_12332,class,section,3D  
SL_12332,candidate_Other_details,sports,stateLevelBasketballrepresentation

输出所需的

Class,Attributes,2344,12332,Remarks  
personal_details,name,Andrew,Samantha,NM  
personal_details,address,G-101 SSR New-Delhi,Park Street Mumbai,NM  
personal_details,Age,22Y,22Y,M  
personal_details,sex,M,F,NM  
personal_details,height,5.8 ft,NULL,NF  
education,Roll_number,22345,NULL,NF  
education,stream,ScienceandMaths,NULL,NF  
class,section,3D,3D,M  
NULL,height,NULL,5.3 ft,NF  
candidate_Other_details,NULL,sports,stateLevelBasketballrepresentation,NF

我已经尝试了awk关联数组的组合为$2,$3和$4使用NR,FNR,但无法得到所需的结果。
一些记录,如第5行,文件2.csv,只有属性(没有类对象)的值维护在我的代码失败的第三列。对于这样的记录,$2可以使用NULL或Blank。

使用GNU awk

awk -F, 'NR==FNR { map[FNR]=$0;next } { split(map[FNR],map1,",");if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { print $0",M" } else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { print $0",NM" } else { print $0",NF" } }' 1.csv 2.csv

解释:

awk -F, 'NR==FNR {                                                           # Set the field delimiter to ","
map[FNR]=$0;                                              # When processing the first file (NR==FNR), create an array map with the file number record as the index and the line as the value
next 
} 
{ 
split(map[FNR],map1,",");                                  # For the second file, split the array entry into map1 using "," as the delimiter
if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { 
print $0",M"                                            # Print "M" entries based on the logic outlined utilising the split entries in map1.
} 
else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) {    # Print the "NM" entries based on the logic outlined
print $0",NM" 
} 
else { 
print $0",NF"                                           # Print the "NF" entries in all other cases.
} 
}' 1.csv 2.csv

最新更新