我有一个名为bin.001.fasta
的文件,如下所示:
>contig_655
GGCGGTTATTTAGTATCTGCCACTCAGCCTCGCTATTATGCGAAATTTGAGGGCAGGAGGAAACCATGAC
AGTAGTCAAGTGCGACAAGC
>contig_866
CCCAGACCTTTCAGTTGTTGGGTGGGGTGGGTGCTGACCGCTGGTGAGGGCTCGACGGCGCCCATCCTGG
CTAGTTGAAC
...
我想做的是获得一个新文件,其中第一列是检索的contig ID,第二列是不带.fasta
:的文件名
contig_655 bin.001
contig_866 bin.001
有什么想法吗?
请您尝试以下操作。
awk -F'>' '
FNR==1{
split(FILENAME,array,".")
file=array[1]"."array[2]
}
/^>/{
print $2,file
}
' Input_file
或者,如果您的Input_file有2个以上的点,则运行以下命令。
awk -F'>' '
FNR==1{
match(FILENAME,/.*./)
file=substr(FILENAME,RSTART,RLENGTH-1)
}
/^>/{
print $2,file
}
' Input_file
解释:添加对上述代码的详细解释。
awk -F'>' ' ##Starting awk program from here and setting field separator as > here for all lines.
FNR==1{ ##Checking condition if this is first line then do following.
split(FILENAME,array,".") ##Splitting filename which is passed to this awk program into an array named array with delimiter .
file=array[1]"."array[2] ##Creating variable file whose value is 1st and 2nd element of array with DOT in between as per OP shown sample.
}
/^>/{ ##Checking condition if a line starts with > then do following.
print $2,file ##Printing 2nd field and variable file value here.
}
' Input_file ##Mentioning Input_file name here.