我想按绝对线性回归(P)列的值对此文件进行排序。我的尝试并没有工作。我不确定它失败了。我从http://www.unix.com/shell-programming-and-scripting/168144-sort-absolute-value.html找到了此代码。
awk -F',' '{print ($2>=0)?$2:-$2, $0}' OFS=',' mycsv1.csv | sort -n -k8,8 | cut -d ',' -f2-
X var,Y var,MIC (strength),MIC-p^2 (nonlinearity),MAS (non-monotonicity),MEV (functionality),MCN (complexity),Linear regression (p)
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
...
请帮助我理解awk脚本以对此文件进行排序。
您可以使用 sed
和 sort
为此,并按照 @hek2mgl的添加和删除字段的非常聪明的逻辑,以保留原始数字:
sed -E 's/,([-]?)([0-9.]+)$/,12,2/' file | sort -t, -k9,9 -nr | cut -f1-8 -d,
-
sed -E 's/,([-]?)([0-9.]+)$/,12,2/'
=>创建字段9作为字段8
的绝对值 -
sort -t, -k9,9 -nr
=>通过新创建的字段,数字和降序顺序 -
cut -f1-8 -d,
=>删除第9个字段,并以所需的排序顺序恢复其原始格式
这是输出:
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
执行三个步骤:
(1)暂时创建一个第9个字段,其中包含字段8的ABS值:
LC_COLLATE=C awk -F, 'NR>1{v=$NF;sub(/-/,"",v);printf "%s%s%s%s",$0,FS,v,RS}' file
^ ------ make sure this is set since sorting, especially the decimal point
depends on the local.
(2)根据第9个字段排序输出:
command_1 | sort -t, -k9r
(3)将其回到尴尬的管道以删除最后一个字段。NF--
减少将有效删除最后一个字段的字段数。 1
始终是正确的,这使得awk
打印行:
command_2 | cut -d, -f1-8
输出:
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
可能会尴尬地做到这一点:
awk -F, 'NR>1{n[substr($NF,1,1)=="-"?substr($NF,2):$NF]=$0}NR==1;END{asorti(n,out);for(i in out)print n[out[i]]}' file