使用AWK进行文本操作



输入文件(example_file.txt):

chr20:1000026:T:C, 0.997, 0, 0.998, 0, 0.013, 0.980, 0.989, 1.000, 0, 0.995
chr20:10000775:A:G, 1.000, 0, 0.938, 0, 0, 0.982, 0, 0, 1.985, 1.180

期望输出(使用awk):

chr20:1000026:T:C, C, T, 0.997, 0, 0.998, 0, 0.013, 0.980, 0.989, 1.000, 0, 0.995
chr20:10000775:A:G, G, A, 1.000, 0, 0.938, 0, 0, 0.982, 0, 0, 1.985, 1.180

我可以得到所需的输出:

awk '{print $1}' example_file.txt > file1.tmp
awk -F: '{print $4",", $3","}' example_file.txt > file2.tmp
awk '{print $2, $3, $4, $5, $6, $7, $8, $9, $10, $11}' example_file.txt > file3.tmp
paste file1.tmp file2.tmp file3.tmp > output.file

output.file:

chr20:1000026:T:C, C, T, 0.997, 0, 0.998, 0, 0.013, 0.980, 0.989, 1.000, 0, 0.995
chr20:10000775:A:G, G, A, 1.000, 0, 0.938, 0, 0, 0.982, 0, 0, 1.985, 1.180

,但是这种方法碎片化且冗长,实际的输入文件有>>11列。

split+$1+$2:

$ awk '
BEGIN {
FS=OFS=", "             # proper field delimiters
}
{
n=split($1,a,/:/)       # get parts of first field
for(i=3;i<=n;i++)       # from the 3rd part on
$2=a[i] OFS $2      # prepend to 2nd field
}1' file                    # output

输出:

chr20:1000026:T:C, C, T, 0.997, 0, 0.998, 0, 0.013, 0.980, 0.989, 1.000, 0, 0.995
chr20:10000775:A:G, G, A, 1.000, 0, 0.938, 0, 0, 0.982, 0, 0, 1.985, 1.180

最新更新