在我的输出文件中,对应于两个浮点数的两列连接在一起,形成一列。这里显示了一个例子,有没有办法将这两列相互分离?
这里,这应该是由空格分隔的5列,但是列3&缺少4。有没有办法用一些UNIX命令来纠正这个错误,比如cut、awk、sed甚至正则表达式?
3.77388 0.608871 -8216.342.42161 1.88655
4.39243 0.625 -8238.241.49211 0.889258
4.38903 0.608871 -7871.71.52994 0.883976
4.286 0.653226 -8287.322.3195 2.13736
4.29313 0.629032 -7954.651.59168 1.02046
修正后的版本应该是这样的:
3.77388 0.608871 -8216.34 2.42161 1.88655
4.39243 0.625 -8238.24 1.49211 0.889258
4.38903 0.608871 -7871.7 1.52994 0.883976
4.286 0.653226 -8287.32 2.3195 2.13736
4.29313 0.629032 -7954.65 1.59168 1.02046
更多信息:第4列总是小于10,所以它只有小数点左边的一位数字。
我尝试过使用awk:
tail -n 5 output.dat | awk '{print $3}'
-8216.342.42161
-8238.241.49211
-7871.71.52994
-8287.322.3195
-7954.651.59168
有没有办法把这一列分成两列?
一个解决方案:
sed 's/(.[0-9]*)([0-9].)/1 2/'
使用Perl一行代码:
perl -pe 's/(d+.d+)(d.d+)/$1 $2/' < output.dat > fixed_output.dat
您的输入文件
$ cat file
3.77388 0.608871 -8216.342.42161 1.88655
4.39243 0.625 -8238.241.49211 0.889258
4.38903 0.608871 -7871.71.52994 0.883976
4.286 0.653226 -8287.322.3195 2.13736
4.29313 0.629032 -7954.651.59168 1.02046
Awk进近
awk '{
n = index($3,".") # index of dot from field 3
x = substr($3,1,n+3) ~/.$/ ? n+1 : n+2 # Decision for no of char to consider
$3 = substr($3,1,x) OFS substr($3,x+1) # separate out fields
$0 = $0 # Recalculate fields (number of fields NF)
$1 = $1 # recalculate the record, removing excess spacing (the new field separator becomes OFS, default is a single space)
}1' OFS='t' file
生成
3.77388 0.608871 -8216.34 2.42161 1.88655
4.39243 0.625 -8238.24 1.49211 0.889258
4.38903 0.608871 -7871.7 1.52994 0.883976
4.286 0.653226 -8287.32 2.3195 2.13736
4.29313 0.629032 -7954.65 1.59168 1.02046