输入文件(example_file.txt):
chr20:1000026:T:C, 0.997, 0, 0.998, 0, 0.013, 0.980, 0.989, 1.000, 0, 0.995
chr20:10000775:A:G, 1.000, 0, 0.938, 0, 0, 0.982, 0, 0, 1.985, 1.180
期望输出(使用awk):
chr20:1000026:T:C, C, T, 0.997, 0, 0.998, 0, 0.013, 0.980, 0.989, 1.000, 0, 0.995
chr20:10000775:A:G, G, A, 1.000, 0, 0.938, 0, 0, 0.982, 0, 0, 1.985, 1.180
我可以得到所需的输出:
awk '{print $1}' example_file.txt > file1.tmp
awk -F: '{print $4",", $3","}' example_file.txt > file2.tmp
awk '{print $2, $3, $4, $5, $6, $7, $8, $9, $10, $11}' example_file.txt > file3.tmp
paste file1.tmp file2.tmp file3.tmp > output.file
output.file:
chr20:1000026:T:C, C, T, 0.997, 0, 0.998, 0, 0.013, 0.980, 0.989, 1.000, 0, 0.995
chr20:10000775:A:G, G, A, 1.000, 0, 0.938, 0, 0, 0.982, 0, 0, 1.985, 1.180
,但是这种方法碎片化且冗长,实际的输入文件有>>11列。
split
+$1
+$2
:
$ awk '
BEGIN {
FS=OFS=", " # proper field delimiters
}
{
n=split($1,a,/:/) # get parts of first field
for(i=3;i<=n;i++) # from the 3rd part on
$2=a[i] OFS $2 # prepend to 2nd field
}1' file # output
输出:
chr20:1000026:T:C, C, T, 0.997, 0, 0.998, 0, 0.013, 0.980, 0.989, 1.000, 0, 0.995
chr20:10000775:A:G, G, A, 1.000, 0, 0.938, 0, 0, 0.982, 0, 0, 1.985, 1.180