我想将一个文件中的 $1 与另一个文件匹配 $1,然后计算$2(File1) < $2(File2) <$3(File1)
之间的匹配数,并为每个匹配项执行此操作
文件 1 段
Chromosome Start End Value
chr1 0 121347754 -0.009727287106215954
chr1 144009053 249250621 0.18180939555168152
chr2 0 90278124 -0.0197499617934227
chr2 95387134 243199373 -0.009399870410561562
chr3 0 91000000 -0.015508042648434639
chr3 93541117 198022430 0.011255052872002125
chr4 0 49064792 -0.02086501568555832
chr4 52700771 143350756 0.013872206211090088
chr4 143350756 191154276 -0.004134085960686207
文件 2 个探头
Chromosome Start End Value Array
chr1 798959 798959 1.0 0
chr1 1048955 1048955 0.0 0
chr1 1158277 1158277 0.0 0
chr1 1314015 1314015 0.5307189226150513 0
chr1 1489928 1489928 0.45127609372138977 0
chr1 1499298 1499298 1.0 0
chr1 1948400 1948400 0.0 0
chr1 2021114 2021114 0.0 0
chr1 2056735 2056735 0.0 0
所以输出将是:
$1(matching both File 1 and 2) $2(File1) $3(File1) $4(number of matches)
输出
Chromosome Start End Probes
chr1 0 121347754 238
chr1 144009053 249250621 590
chr2 0 90278124 321
我一直在尝试用awk来做到这一点,但它不起作用!
这是我所得到
的awk 'FNR==NR{a[$1]=$1 FS $2;next}{ print $1[File1] "t" $2[File1] "t" $3[File1] "t" $2[File1] < $2[File2] < $3[File1] }' File1 File2
使用awk的另一种方式
awk 'BEGIN {print "Chromosome Start End Probes"}
NR==FNR{a[$1]=a[$1]==""?$2:a[$1] FS $2;next}
{ delete c
split(a[$1],b,FS)
for (i=1;i<=length(b);i++)
if (b[i]>$2&&b[i]<$3) c[$1]++
if (c[$1])print $1,$2,$3,c[$1]
}' file2 file1
解释
-
BEGIN {print "Chromosome Start End Probes"}
打印标题 -
NR==FNR{a[$1]=a[$1]==""?$2:a[$1] FS $2;next}
,读取 file2,将值附加到数组 a,键为 $1 -
split(a[$1],b,FS)
,将数组 a[$1] 值拆分为数组 b -
if (b[i]>$2&&b[i]<$3) c[$1]++
算数