我有一系列具有以下语法的行:
sweets apple:11 banana:9 cherry:101 donut:1 egg tart:86
tossed added:5 anted:13 ashley:3 bandied:3 flung:6 lobbed:4 salad:26 slung:9
plenty abundance:3 a lot:83 ample:12 aroar:3 a ton:12 enow:5 gobs:5 lots:27 lotsa:8
(the large spaces are all tabs)
所需的输出是按冒号后的数字进行数字排序的列 2+。
例如sweets cherry:101 egg tart:86 apple:11 banana:9 donut:1
tossed salad:26 anted:13 slung:9 flung:6 added:5 lobbed:4 ashley:3 bandied:3
plenty a lot:83 lots:27 a ton:12 ample:12 lotsa:8 abundance:3 enow:5 gobs:5 aroar:3
我经常使用红宝石单行。
//alphabetize within a line, delimited by pipes "|"
ruby -pe '$_=$_.strip.split("|").sort().join("|")+"n"'
//case insensitive with no dupes:
ruby -pe '$_=$_.strip.split("|").sort_by{|x| x.downcase }.uniq.join("|")+"n"'
//keep the first term:
ruby -pe '$_=$_.split(":")[0].strip+":"+$_.split(":")[1].strip.split("|").sort.join("|")+"n"'
但是我不能完全理解一种简单而干净的方法来按尾随数字进行排序。 即":NN"。我相信这可以通过几个字符来完成。如何?我也很高兴有一个awk解决方案,但Ruby通常更干净,用于更复杂的处理。
给定:
cat file
headword apple:11 zanana:9 cherry:101 donut:1 egg tart:86
在 Ruby 中,我会做:
ruby -F"t" -lane 'puts $F.sort_by{ |w|
idx=w[/(?<=:)d+/]
if (idx.nil?)
-1/0.0
else
-idx.to_i
end
}.join("t")' file
或者,如果我们知道第一个单词不要排序,而其余单词有数字,您可以执行以下操作:
ruby -F"t" -lane 'hw, *arr = $F; puts "#{hw}t#{arr.sort_by{ |w| -w[/(?<=:)d+/].to_i }.join("t")}"' file
或者在 GNU awk 中,你可以做:
awk 'BEGIN{OFS="t"}
function byn(i1, v1, i2, v2, l, r)
{
if (index(v1,":")==0 || index(v2,":")==0) return -1
split(v1,va1,/:/)
split(v2,va2,/:/)
if (va1[2]>va2[2])
return -1
else if (va1[2]==va2[2])
return 0
else
return 1
}
{split($0, fields, /t+/)
asort(fields, result, "byn")
for (i=1; i<=length(result); i++)
printf "%s%s", result[i], i==length(result) ? ORS : OFS}' file
所有三个打印:
headword cherry:101 egg tart:86 apple:11 zanana:9 donut:1
假设a
是拆分t
字符上的每一行的结果。
irb(main):009:0> "#{a[0]}t#{a[1..].sort { |a, b| b.split(":")[1].to_i <=> a.split(":")[1].to_i }.join("t")}"
=> "headwordtcherry:101tegg tart:86tapple:11tbanana:9tdonut:1"
每行都按制表符拆分。这给了我们一个数组:
["headword", "apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
我们可以不理会第一个元素。然后,我们可以通过将剩余元素拆分为键/值对并比较每个元素的第二个元素来对它们进行排序。如果我们将b
与a
进行比较,我们会得到降序。
ruby -pe 'a=$_.split("t");puts "#{a[0]}t#{a[1..].sort{|a,b|b.split(":")[1].to_i<=>a.split(":")[1].to_i}.join("t")}"'
str = "headwordtapple:11tbanana:9tcherry:101tdonut:1tegg tart:86"
hw, *arr = str.split("t")
hw
#=> "headword"
arr
#=> ["apple:11", "banana:9", "cherry:101", "donut:1", "egg tart:86"]
[hw, *arr.sort_by { |s| -s[/(?<=:)d+/].to_i }].join("t")
#=>"headwordtcherry:101tegg tart:86tapple:11tbanana:9tdonut:1"
> str = "headwordtapple:11tbanana:9tcherry:101tdonut:1tegg tart:86"
=> "headwordtapple:11tbanana:9tcherry:101tdonut:1tegg tart:86"
> (x = str.split("t"))[1..-1].sort_by { |x| x.split(':')[-1].to_i }.reverse.prepend(x[0]).join("t")
=> "headwordtcherry:101tegg tart:86tapple:11tbanana:9tdonut:1"
使用 GNU awk 表示sorted_in
$ cat tst.awk
BEGIN {
FS=OFS="t"
PROCINFO["sorted_in"] = "@val_num_desc"
}
{
for (i=2; i<=NF; i++) {
split($i,t,":")
nums[i] = t[2]
}
out = $1
for (i in nums) {
out = out OFS $i
}
print out
}
$ awk -f tst.awk file
sweets cherry:101 egg tart:86 apple:11 banana:9 donut:1
tossed salad:26 anted:13 slung:9 flung:6 added:5 lobbed:4 bandied:3 ashley:3
plenty a lot:83 lots:27 a ton:12 ample:12 lotsa:8 enow:5 gobs:5 aroar:3 abundance:3
如果您出于某种原因真的发现将它们全部塞进"单行"很有用,那么您当然可以:
$ awk -F't' 'BEGIN{PROCINFO["sorted_in"]="@val_num_desc"} {for(i=2;i<=NF;i++){split($i,t,":");n[i]=t[2]}o=$1;for(i in n)o=o FS $i;print o}' file
sweets cherry:101 egg tart:86 apple:11 banana:9 donut:1
tossed salad:26 anted:13 slung:9 flung:6 added:5 lobbed:4 bandied:3 ashley:3
plenty a lot:83 lots:27 a ton:12 ample:12 lotsa:8 enow:5 gobs:5 aroar:3 abundance:3
但它失去了一点清晰度。
不是单行,而是我的看法:
line = "headwordtapple:11tbanana:9tcherry:101tdonut:1tegg tart:86"
fields = line.split(/t/)
result = [fields[0]]
.concat(
fields[1..-1]
.map {|each| each.split(":")}
.sort {|a, b| b[1].to_i <=> a[1].to_i}
.map {|each| each.join(":")}
)
.join "t"