我的文件看起来像
Tree:0,pos:0,len:2.29276,TMRCA:0.795328,ARG:,len:2.29276,TMRCA:0.795328
NEWICK_TREE: [169]((2:0.147398,(6:0.136844,(((9:0.00903981,4:0.00903981):0.084126,5:0.0931658):0.0077254,(7:0.0053182,8:0.0053182):0.095573):0.0359525):0.0105546):0.647929,(0:0.199142,(1:0.0103058,3:0.0103058):0.188836):0.596186);
SITE: 0 0.0123617064 0.648849164 0010111111
iHistoryMax: 0
Tree:1,pos:0.0169589,len:2.28476,TMRCA:0.795328,ARG:,len:2.28476,TMRCA:0.795328
NEWICK_TREE: [303]((2:0.147398,((6:0.00230499,1:0.00230499):0.134539,(((9:0.00903981,4:0.00903981):0.084126,5:0.0931658):0.0077254,(7:0.0053182,8:0.0053182):0.095573):0.0359525):0.0105546):0.647929,(0:0.199142,3:0.199142):0.596186);
iHistoryMax: 1
Tree:2,pos:0.0472255,len:2.77342,TMRCA:0.795328,ARG:,len:2.77342,TMRCA:0.795328
NEWICK_TREE: [67](((6:0.00230499,1:0.00230499):0.134539,(((9:0.00903981,4:0.00903981):0.084126,5:0.0931658):0.0077254,(7:0.0053182,8:0.0053182):0.095573):0.0359525):0.658484,((0:0.199142,3:0.199142):0.436921,2:0.636062):0.159266);
iHistoryMax: 2
Tree:3,pos:0.0539094,len:2.96385,TMRCA:0.795328,ARG:,len:2.96385,TMRCA:0.795328
NEWICK_TREE: [40](((6:0.00230499,1:0.00230499):0.134539,(((9:0.00903981,4:0.00903981):0.084126,5:0.0931658):0.0077254,(7:0.0053182,8:0.0053182):0.095573):0.0359525):0.658484,((0:0.389568,3:0.389568):0.246494,2:0.636062):0.159266);
iHistoryMax: 3
但是,我只需要每棵树的pos(在行树中:1,pos),输出应仅为1列中的pos,其中3行(或更多)。树线的位置并不总是在每个3线中,因为之间的零件长度可以改变。这可以在bash中完成?
使用awk
和:
和,
的定界符,然后打印所需的字段。例如,这将打印Tree
和pos
数字:
awk -F[:,] '/^Tree:/{print $2,$4}' file
使用grep与-p
grep -Po "(?<=Tree.*pos:)[0-9.]+" file
0
0.0169589
0.0472255
0.0539094