我有两个文件,我正在尝试根据第1和2列加入/合并。
input1
22 42960000 rs149201999 A AC 100 PASS LDAF=0.0649;RSQ=0.8652;AN=2184;ERATE=0.0046;VT=SNP;AA=.;AVGPOST=0.9799;THETA=0.0149;SNPSOURCE=LOWCOV;AC=134;AF=0.06;ASN_AF=0.04;AMR_AF=0.05;AFR_AF=0.10;EUR_AF=0.06
input2
22 42960000 . A AC . . ;AA=1;AFE=0.989691;ASNE=1;EUN=0.992509;AFW=1;MED=0.991071;LAT=1
,输出将为
22 42960000 . A AC . . ;AA=1;AFE=0.989691;ASNE=1;EUN=0.992509;AFW=1;MED=0.991071;LAT=1;LDAF=0.0649;RSQ=0.8652;AN=2184;ERATE=0.0046;VT=SNP;AA=.;AVGPOST=0.9799;THETA=0.0149;SNPSOURCE=LOWCOV;AC=134;AF=0.06;ASN_AF=0.04;AMR_AF=0.05;AFR_AF=0.10;EUR_AF=0.06
注意每列通过选项卡分开。
这是使用GNU awk
:
awk 'FNR==NR { array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }' input1 input2
结果:
22 42960000 . A AC . . ;AA=1;AFE=0.989691;ASNE=1;EUN=0.992509;AFW=1;MED=0.991071;LAT=1;LDAF=0.0649;RSQ=0.8652;AN=2184;ERATE=0.0046;VT=SNP;AA=.;AVGPOST=0.9799;THETA=0.0149;SNPSOURCE=LOWCOV;AC=134;AF=0.06;ASN_AF=0.04;AMR_AF=0.05;AFR_AF=0.10;EUR_AF=0.06
这应该有效:
s=%%%%%%
join -j1 -o1.1,1.2,1.3,1.4,1.5,1.6,1.7,2.7 <(sed "s/t/$s/" input2)
<(sed "s/t/$s/" input1)
| sed "s/$s/t/;
s/(=[^ ]*) ([^ ]*=)/1;2/;
s/ +/t/g"