在awk中创建一个唯一的数组:这个片段可以详细说明吗

多亏了@EdMorton，我可以用这种方式在awk中唯一化一个数组：

BEGIN {
# create an array 
# here, I create an array from a string, but other approaches are possible, too
split("a b c d e a b", array)
# unique it
for (i=1; i in array; i++) {
if ( !seen[array[i]]++ ) {
unique[++j] = array[i]
}
}
# print out the result
for (i=1; i in unique; i++) {
print unique[i]
}
# results in:
# a
# b
# c
# d
# e
}

然而，我不明白的是，这个( !seen[array[i]]++ )条件有一个增量：

我知道我们在seen数组中收集唯一索引
因此，我们检查我们的临时数组seen是否已经有索引array[i](如果还没有，则将其添加到unique(
但指数之后的增量是我仍然无法得到的：((尽管Ed提供了详细的解释(

所以，我的问题是：我们能以更详细的方式重写这个条件吗？也许这真的有助于最终确定我对它的看法：(

希望这更清楚，但idk-我能说的最好的是它根据要求更详细！

$ cat tst.awk
BEGIN {
# create an array
# here, I create an array from a string, but other approaches are possible, too
split("a b c d e a b", array)
# unique it
for (i=1; i in array; i++) {
val = array[i]
count[val] = count[val] + 1
if ( count[val] == 1 ) {
is_first_time_val_seen = 1
}
else {
is_first_time_val_seen = 0
}
if ( is_first_time_val_seen ) {
unique[++j] = val
}
}
# print out the result
for (i=1; i in unique; i++) {
print unique[i]
}
}

$ awk -f tst.awk
a
b
c
d
e

另一种方法是将array的值作为键放入一个新的associative数组中。这将加强唯一性：

BEGIN {
# it's helpful to use the return value from `split`
n = split("a b c d e a b", array)
# use the element value as a key.
# It doesn't really matter what the right-hand side of the assignment is.
for (i = 1; i <= n; i++) uniq[array[i]] = i
# now, it's easy to iterate over the unique keys
for (elem in uniq) print elem
}

无保证顺序的输出：

a
b
c
d
e

如果您使用GNU awk，请使用PROCINFO["sorted_in"]来控制数组遍历的排序

相关内容

最新更新

热门标签：