awk为什么存储的数组没有按相同的顺序检索



我有以下数据

SB 1.2.27: SB 1.2.27
SB 1.2.28: SB 1.2.28, SB 1.2.29, SB 1.2.28-29
SB 1.2.29: SB 1.2.28, SB 1.2.29, SB 1.2.28-29
SB 1.2.30: SB 1.2.30
SB 1.3.1: SB 1.3.1
SB 1.21.1: SB 1.21.1

我使用下面的脚本提取第二列中唯一的也是-部分

awk 'BEGIN{FS=": "}{
# I want only the dash part not the whole $2. eg: SB 1.2.28-29
if(match($0,/(SB [0-9]+.[0-9]+.[0-9]+-[0-9]+)$/,hare)){
sloka[$2] = hare[1]
}else{
sloka[$2]= $1
}
}END{
for (i in sloka){
print sloka[i]": "i
}
}' DATA.TXT

我得到的结果是:

SB 1.2.28-29: SB 1.2.28, SB 1.2.29, SB 1.2.28-29
SB 1.2.30: SB 1.2.30
SB 1.2.27: SB 1.2.27
SB 1.3.1: SB 1.3.1
SB 1.21.1: SB 1.21.1

我期待:

SB 1.2.27: SB 1.2.27
SB 1.2.28-29: SB 1.2.28, SB 1.2.29, SB 1.2.28-29
SB 1.2.30: SB 1.2.30
SB 1.3.1: SB 1.3.1
SB 1.21.1: SB 1.21.1

也不是

SB 1.2.27: SB 1.2.27
SB 1.2.28-29: SB 1.2.28, SB 1.2.29, SB 1.2.28-29
SB 1.2.30: SB 1.2.30
SB 1.21.1: SB 1.21.1  (* this should be next)
SB 1.3.1: SB 1.3.1

我的标准方法是使用第二个数字索引数组,如

awk 'BEGIN{FS=": "; num_elms = 0;}{
if not ($2 in sloka) {
num_elms++
lookup[num_elms] = $2
}
# I want only the dash part not the whole $2. eg: SB 1.2.28-29
if(match($0,/(SB [0-9]+.[0-9]+.[0-9]+-[0-9]+)$/,hare)){
sloka[$2] = hare[1]
}else{
sloka[$2]= $1
}
}END{
for (i = 1; i <= num_elms; i++){
print sloka[lookup[i]]": "lookup[i]
}
}' DATA.TXT

注意:我没有测试这个,但它显示了模式。

真的不清楚你想做什么——就是这样吗?

$ awk -F'[:,] ' '!seen[$NF]++{sub(/[^:]+/,$NF); print}' file
SB 1.2.27: SB 1.2.27
SB 1.2.28-29: SB 1.2.28, SB 1.2.29, SB 1.2.28-29
SB 1.2.30: SB 1.2.30
SB 1.3.1: SB 1.3.1
SB 1.21.1: SB 1.21.1

最新更新