按字段的绝对值对Unix进行对文件进行排序



我想按绝对线性回归(P)列的值对此文件进行排序。我的尝试并没有工作。我不确定它失败了。我从http://www.unix.com/shell-programming-and-scripting/168144-sort-absolute-value.html找到了此代码。

awk -F',' '{print ($2>=0)?$2:-$2, $0}' OFS=',' mycsv1.csv | sort -n -k8,8 | cut -d ',' -f2-

X var,Y var,MIC (strength),MIC-p^2 (nonlinearity),MAS (non-monotonicity),MEV (functionality),MCN (complexity),Linear regression (p)
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
...

请帮助我理解awk脚本以对此文件进行排序。

您可以使用 sedsort为此,并按照 @hek2mgl的添加和删除字段的非常聪明的逻辑,以保留原始数字:

 sed -E 's/,([-]?)([0-9.]+)$/,12,2/' file | sort -t, -k9,9 -nr | cut -f1-8 -d,
  • sed -E 's/,([-]?)([0-9.]+)$/,12,2/' =>创建字段9作为字段8
  • 的绝对值
  • sort -t, -k9,9 -nr =>通过新创建的字段,数字和降序顺序
  • cut -f1-8 -d, =>删除第9个字段,并以所需的排序顺序恢复其原始格式

这是输出:

AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648

执行三个步骤:

(1)暂时创建一个第9个字段,其中包含字段8的ABS值:

LC_COLLATE=C awk -F, 'NR>1{v=$NF;sub(/-/,"",v);printf "%s%s%s%s",$0,FS,v,RS}' file
        ^ ------ make sure this is set since sorting, especially the decimal point
             depends on the local.

(2)根据第9个字段排序输出:

command_1 | sort -t, -k9r

(3)将其回到尴尬的管道以删除最后一个字段。NF--减少将有效删除最后一个字段的字段数。 1始终是正确的,这使得awk打印行:

command_2 | cut -d, -f1-8

输出:

AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648

可能会尴尬地做到这一点:

awk -F, 'NR>1{n[substr($NF,1,1)=="-"?substr($NF,2):$NF]=$0}NR==1;END{asorti(n,out);for(i in out)print n[out[i]]}' file

最新更新