单行按尾随数字后缀排序



我有一系列具有以下语法的行:

sweets   apple:11   banana:9   cherry:101   donut:1   egg tart:86   
tossed   added:5   anted:13   ashley:3   bandied:3   flung:6   lobbed:4   salad:26   slung:9
plenty   abundance:3   a lot:83   ample:12   aroar:3   a ton:12   enow:5   gobs:5   lots:27   lotsa:8   

(the large spaces are all tabs)

所需的输出是按冒号后的数字进行数字排序的列 2+。

例如
sweets   cherry:101   egg tart:86   apple:11   banana:9   donut:1   
tossed   salad:26   anted:13   slung:9   flung:6   added:5   lobbed:4   ashley:3   bandied:3
plenty   a lot:83   lots:27   a ton:12   ample:12   lotsa:8   abundance:3   enow:5   gobs:5   aroar:3   

我经常使用红宝石单行。

//alphabetize within a line, delimited by pipes "|" 
ruby -pe '$_=$_.strip.split("|").sort().join("|")+"n"'
//case insensitive with no dupes:
ruby -pe '$_=$_.strip.split("|").sort_by{|x| x.downcase }.uniq.join("|")+"n"' 
//keep the first term:
ruby -pe '$_=$_.split(":")[0].strip+":"+$_.split(":")[1].strip.split("|").sort.join("|")+"n"'

但是我不能完全理解一种简单而干净的方法来按尾随数字进行排序。 即":NN"。我相信这可以通过几个字符来完成。如何?我也很高兴有一个awk解决方案,但Ruby通常更干净,用于更复杂的处理。

给定:

cat file
headword    apple:11    zanana:9    cherry:101  donut:1 egg tart:86

在 Ruby 中,我会做:

ruby -F"t" -lane  'puts $F.sort_by{ |w|  
idx=w[/(?<=:)d+/]
if (idx.nil?)
-1/0.0
else 
-idx.to_i
end
}.join("t")' file

或者,如果我们知道第一个单词不要排序,而其余单词有数字,您可以执行以下操作:

ruby -F"t" -lane 'hw, *arr = $F; puts "#{hw}t#{arr.sort_by{ |w| -w[/(?<=:)d+/].to_i }.join("t")}"' file 

或者在 GNU awk 中,你可以做:

awk 'BEGIN{OFS="t"}
function byn(i1, v1, i2, v2,    l, r)
{
if (index(v1,":")==0 || index(v2,":")==0) return -1
split(v1,va1,/:/)
split(v2,va2,/:/)
if (va1[2]>va2[2])
return -1
else if (va1[2]==va2[2])
return 0
else
return 1
}
{split($0, fields, /t+/)
asort(fields, result, "byn")
for (i=1; i<=length(result); i++) 
printf "%s%s", result[i], i==length(result) ? ORS : OFS}' file

所有三个打印:

headword    cherry:101  egg tart:86 apple:11    zanana:9    donut:1

假设a是拆分t字符上的每一行的结果。

irb(main):009:0> "#{a[0]}t#{a[1..].sort { |a, b| b.split(":")[1].to_i <=> a.split(":")[1].to_i }.join("t")}"
=> "headwordtcherry:101tegg tart:86tapple:11tbanana:9tdonut:1"

每行都按制表符拆分。这给了我们一个数组:

["headword", "apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]

我们可以不理会第一个元素。然后,我们可以通过将剩余元素拆分为键/值对并比较每个元素的第二个元素来对它们进行排序。如果我们将ba进行比较,我们会得到降序。

ruby -pe 'a=$_.split("t");puts "#{a[0]}t#{a[1..].sort{|a,b|b.split(":")[1].to_i<=>a.split(":")[1].to_i}.join("t")}"'
str = "headwordtapple:11tbanana:9tcherry:101tdonut:1tegg tart:86"
hw, *arr = str.split("t")
hw
#=> "headword"
arr
#=> ["apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
[hw, *arr.sort_by { |s| -s[/(?<=:)d+/].to_i }].join("t")
#=>"headwordtcherry:101tegg tart:86tapple:11tbanana:9tdonut:1"
> str = "headwordtapple:11tbanana:9tcherry:101tdonut:1tegg tart:86"
=> "headwordtapple:11tbanana:9tcherry:101tdonut:1tegg tart:86"
> (x = str.split("t"))[1..-1].sort_by { |x| x.split(':')[-1].to_i }.reverse.prepend(x[0]).join("t")
=> "headwordtcherry:101tegg tart:86tapple:11tbanana:9tdonut:1"

使用 GNU awk 表示sorted_in

$ cat tst.awk
BEGIN {
FS=OFS="t"
PROCINFO["sorted_in"] = "@val_num_desc"
}
{
for (i=2; i<=NF; i++) {
split($i,t,":")
nums[i] = t[2]
}
out = $1
for (i in nums) {
out = out OFS $i
}
print out
}

$ awk -f tst.awk file
sweets  cherry:101      egg tart:86     apple:11        banana:9        donut:1
tossed  salad:26        anted:13        slung:9 flung:6 added:5 lobbed:4       bandied:3        ashley:3
plenty  a lot:83        lots:27 a ton:12        ample:12        lotsa:8 enow:5 gobs:5   aroar:3 abundance:3

如果您出于某种原因真的发现将它们全部塞进"单行"很有用,那么您当然可以:

$ awk -F't' 'BEGIN{PROCINFO["sorted_in"]="@val_num_desc"} {for(i=2;i<=NF;i++){split($i,t,":");n[i]=t[2]}o=$1;for(i in n)o=o FS $i;print o}' file
sweets  cherry:101      egg tart:86     apple:11        banana:9        donut:1
tossed  salad:26        anted:13        slung:9 flung:6 added:5 lobbed:4       bandied:3        ashley:3
plenty  a lot:83        lots:27 a ton:12        ample:12        lotsa:8 enow:5 gobs:5   aroar:3 abundance:3

但它失去了一点清晰度。

不是单行,而是我的看法:

line = "headwordtapple:11tbanana:9tcherry:101tdonut:1tegg tart:86"
fields = line.split(/t/)
result =  [fields[0]]
.concat(
fields[1..-1]
.map {|each| each.split(":")}
.sort {|a, b| b[1].to_i <=> a[1].to_i}
.map {|each| each.join(":")}
)
.join "t"

最新更新