我有一个.csv格式的大型数据表(见下文(。如果每列和行中的数字小于某个值,我想用零替换它,比如说 0.1。
谁能帮我一把?多谢。
我想可以用sed
完成,如本例所示
BCC_ACR_CR BCC_ACR_CU BCC_ACR_FE BCC_ACR_MN BCC_ACR_MO
0.2826027 3.89E-12 0.58420346 2.23E-13 0.2105587
0.27986588 3.80E-12 0.58501168 2.27E-13 0.20890705
0.27986588 3.80E-12 0.58501168 2.27E-13 0.20890705
0.27986588 3.80E-12 0.58501168 2.27E-13 0.20890705
0.28038733 3.81E-12 0.58196375 5.88E-05 0.21239142
0.26855376 3.27E-12 0.60364524 2.06E-13 0.11205138
0.27220042 3.28E-12 0.60349573 2.08E-13 0.11530944
0.36294869 3.14E-12 0.50515464 1.64E-13 3.12E-12
0.36294869 3.14E-12 0.50515464 1.64E-13 3.12E-12
0.40837234 3.07E-12 0.47202708 1.73E-13 3.03E-12
0.3643896 3.25E-12 0.50431576 1.63E-13 3.14E-12
0.3643896 3.25E-12 0.50431576 1.63E-13 3.14E-12
0.35885258 3.21E-12 0.50978952 1.64E-13 3.12E-12
这是一个用于 GNU awk 的。假定字段分隔符是一系列空格,因此不允许空字段:
$ gawk -v value=0.1 ' # give treshold values as parameter
BEGIN { RS="( +n?|n)" } # every field is considered a record
{
ORS=RT # RT stores actualized RS
if($0<value) # comparison
$0=sprintf("0%-" length()-1 "s","") # format data to fit the field
}1' file # output
输出:
BCC_ACR_CR BCC_ACR_CU BCC_ACR_FE BCC_ACR_MN BCC_ACR_MO
0.2826027 0 0.58420346 0 0.2105587
0.27986588 0 0.58501168 0 0.20890705
0.27986588 0 0.58501168 0 0.20890705
0.27986588 0 0.58501168 0 0.20890705
0.28038733 0 0.58196375 0 0.21239142
0.26855376 0 0.60364524 0 0.11205138
0.27220042 0 0.60349573 0 0.11530944
0.36294869 0 0.50515464 0 0
0.36294869 0 0.50515464 0 0
0.40837234 0 0.47202708 0 0
0.3643896 0 0.50431576 0 0
0.3643896 0 0.50431576 0 0
0.35885258 0 0.50978952 0 0