嗨,伙计们,我有两个文件,每个文件都有 N 列和 M 行。
文件1
1 2 4 6 8
20 4 8 10 12
15 5 7 9 11
文件2
1 a1 b1 c5 d1
2 a1 b2 c4 d2
3 a2 b3 c3 d3
19 a3 b4 c2 d4
14 a4 b5 c1 d5
我需要的是搜索第 1 列中最接近的值,并在输出中打印特定列。 因此,例如输出应该是:
文件3
1 2 4 6 8
1 a1 b1 c5 d1
20 4 8 10 12
19 a3 b4 c2 d4
15 5 7 9 11
14 a4 b5 c1 d5
由于 1 = 1,19 最接近 20,14 最接近 15,因此输出是这些行。如何在awk或任何其他工具中执行此操作?
帮助!
这是我到目前为止所拥有的:
echo "ARGIND == 1 {
s1[$1]=$1;
s2[$1]=$2;
s3[$1]=$3;
s4[$1]=$4;
s5[$1]=$5;
}
ARGIND == 2 {
bestdiff=-1;
for (v in s1)
if (bestdiff < 0 || (v-$1)**2 <= bestdiff)
{
s11=s1[v];
s12=s2[v];
s13=s3[v];
s14=s4[v];
s15=s5[v];
bestdiff=(v-$1)**2;
if (bestdiff < 2){
print $0
print s11,s12,s13,s14,s15}}">diff.awk
awk -f diff.awk file2 file1
输出:
1 2 4 6 8
1 a1 b1 c5 d1
20 4 8 10 12
19 a3 b4 c2 d4
15 5 7 9 1
14 a4 b5 c1 d5
1 2
1 1
14 15
我不知道为什么最后三行。
我最后试图给出一种回答的方法:
function closest(b,i) { # define a function
distance=999999; # this should be higher than the max index to avoid returning null
for (x in b) { # loop over the array to get its keys
(x+0 > i+0) ? tmp = x - i : tmp = i - x # +0 to compare integers, ternary operator to reduce code, compute the diff between the key and the target
if (tmp < distance) { # if the distance if less than preceding, update
distance = tmp
found = x # and save the key actually found closest
}
}
return found # return the closest key
}
{ # parse the files for each line (no condition)
if (NR>FNR) { # If we changed file (File Number Record is less than Number Record) change array
b[$1]=$0 # make an array with $1 as key
} else {
akeys[max++] = $1 # store the array keys to ensure order at end as for (x in array) does not guarantee the order
a[$1]=$0 # make an array with $1 as key
}
}
END { # Now we ended parsing the two files, print the result
for (i in akeys) { # loop over the first file keys
print a[akeys[i]] # print the value for this file
if (akeys[i] in b) { # if the same key exist in second file
print b[akeys[i]] # then print it
} else {
bindex = closest(b,akeys[i]) # call the function to find the closest key from second file
print b[bindex] # print what we found
}
}
}
我希望这足以明确评论,如果需要,请随时发表评论。
警告 如果第二个文件中有大量行,这可能会变得非常慢,因为第二个数组将针对第二个文件中不存在的第一个文件的每个键进行解析。/警告
给定您的示例输入 a1 和 a2:
$ mawk -f closest.awk a1 a2
1 2 4 6 8
1 a1 b1 c5 d1
20 4 8 10 12
19 a3 b4 c2 d4
15 5 7 9 11
14 a4 b5 c1 d5