检查两个文件在两列值中是否匹配，并将这些行打印到新的输出文件中

我想根据每个文件的两列值来匹配两个文件。如果"；BP"；以及"；P〃；在同一行中匹配，我想将这些行打印在第三个文件上，就像文件2一样。

文件1:

CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704
10 110408937 4.409e+00 1.623e+00 6.602e-03 2 1 Cardiovascular rs113627704
10 110408937 2.382e+00 1.124e+00 3.414e-02 3 1 Medication rs113627704

文件2:

CHR F SNP BP P TOTAL
10 1 rs113627704 110408937 1.112e-02 456
4 1 rs43567 2345677 0.045457 567
3 1 rs567899 479899 0.3456 223

期望输出：

CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704

我试过以下两种：

awk 'FNR==NR{a[$4,$5]=$0;next}{if(b=a[$2,$5]){print b}}' file1 file2 > file3

在这里我得到了错误"；bash:awk:找不到命令"我一直在用awk，它总是有效的。

awk 'FNR==NR {a[$4,$5]=$0; next} ($4,$5) in a {print a[$2,$5], $0}' file1 file2 > file3

这里有一个空文件。

这应该有效：

$ awk 'NR==FNR{a[$4,$5]=$0;next}(($2,$5) in a)' file2 file1

输出：

CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704

解释：

$ awk '
NR==FNR {         # process file2 as output we want are from file1
a[$4,$5]=$0   # desired fields are 4th and 5th, use them as hash key
next          # move to next record
}                 # process file1 below this point
(($2,$5) in a)    # test if 2nd and 5th in hash and output
' file2 file1     # mind the file order

您的命令中的单词awk中有一些不可见的字符：

awk 'FNR==NR{a[$4,$5]=$0;next}{if(b=a[$2,$5]){print b}}' file1 file2 > file3

使用命令中的字符串：

$ type awk
-bash: type: awk: not found

手动键入awk:

$ type awk
awk is hashed (/usr/bin/awk)

相关内容

最新更新

热门标签：