我有2个CSV文件匹配和非匹配字段。
我想比较第二,第三和第四列,并在此基础上输出匹配(M),不匹配(NM)和未找到(NF with NULL)列。
a)如果col. 2,3和4完全匹配,则它是匹配的。
b)如果col. 2和3匹配但不匹配,那么它应该不匹配。
c)如果col. 2或col. 3本身不匹配,则应为not found case。
1. csv
SL_2344,personal_details,name,Andrew
SL_2344,personal_details,address,G-101 SSR New-Delhi
SL_2344,personal_details,Age,22Y
SL_2344,personal_details,sex,M
SL_2344,personal_details,height,5.8 ft
SL_2344,education,Roll_number,22345
SL_2344,education,stream,ScienceandMaths
SL_2344,class,section,3D
2. csv
SL_12332,personal_details,name,Samantha
SL_12332,personal_details,address,Park Street Mumbai
SL_12332,personal_details,Age,22Y
SL_12332,personal_details,sex,F
SL_12332,height,5.8 ft
SL_12332,class,section,3D
SL_12332,candidate_Other_details,sports,stateLevelBasketballrepresentation
输出所需的
Class,Attributes,2344,12332,Remarks
personal_details,name,Andrew,Samantha,NM
personal_details,address,G-101 SSR New-Delhi,Park Street Mumbai,NM
personal_details,Age,22Y,22Y,M
personal_details,sex,M,F,NM
personal_details,height,5.8 ft,NULL,NF
education,Roll_number,22345,NULL,NF
education,stream,ScienceandMaths,NULL,NF
class,section,3D,3D,M
NULL,height,NULL,5.3 ft,NF
candidate_Other_details,NULL,sports,stateLevelBasketballrepresentation,NF
我已经尝试了awk关联数组的组合为$2,$3和$4使用NR,FNR,但无法得到所需的结果。
一些记录,如第5行,文件2.csv,只有属性(没有类对象)的值维护在我的代码失败的第三列。对于这样的记录,$2可以使用NULL或Blank。
使用GNU awk
awk -F, 'NR==FNR { map[FNR]=$0;next } { split(map[FNR],map1,",");if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { print $0",M" } else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { print $0",NM" } else { print $0",NF" } }' 1.csv 2.csv
解释:
awk -F, 'NR==FNR { # Set the field delimiter to ","
map[FNR]=$0; # When processing the first file (NR==FNR), create an array map with the file number record as the index and the line as the value
next
}
{
split(map[FNR],map1,","); # For the second file, split the array entry into map1 using "," as the delimiter
if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) {
print $0",M" # Print "M" entries based on the logic outlined utilising the split entries in map1.
}
else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { # Print the "NM" entries based on the logic outlined
print $0",NM"
}
else {
print $0",NF" # Print the "NF" entries in all other cases.
}
}' 1.csv 2.csv