多亏了@EdMorton,我可以用这种方式在awk中唯一化一个数组:
BEGIN {
# create an array
# here, I create an array from a string, but other approaches are possible, too
split("a b c d e a b", array)
# unique it
for (i=1; i in array; i++) {
if ( !seen[array[i]]++ ) {
unique[++j] = array[i]
}
}
# print out the result
for (i=1; i in unique; i++) {
print unique[i]
}
# results in:
# a
# b
# c
# d
# e
}
然而,我不明白的是,这个( !seen[array[i]]++ )
条件有一个增量:
- 我知道我们在
seen
数组中收集唯一索引 - 因此,我们检查我们的临时数组
seen
是否已经有索引array[i]
(如果还没有,则将其添加到unique( - 但指数之后的增量是我仍然无法得到的:((尽管Ed提供了详细的解释(
所以,我的问题是:我们能以更详细的方式重写这个条件吗?也许这真的有助于最终确定我对它的看法:(
希望这更清楚,但idk-我能说的最好的是它根据要求更详细!
$ cat tst.awk
BEGIN {
# create an array
# here, I create an array from a string, but other approaches are possible, too
split("a b c d e a b", array)
# unique it
for (i=1; i in array; i++) {
val = array[i]
count[val] = count[val] + 1
if ( count[val] == 1 ) {
is_first_time_val_seen = 1
}
else {
is_first_time_val_seen = 0
}
if ( is_first_time_val_seen ) {
unique[++j] = val
}
}
# print out the result
for (i=1; i in unique; i++) {
print unique[i]
}
}
$ awk -f tst.awk
a
b
c
d
e
另一种方法是将array
的值作为键放入一个新的associative数组中。这将加强唯一性:
BEGIN {
# it's helpful to use the return value from `split`
n = split("a b c d e a b", array)
# use the element value as a key.
# It doesn't really matter what the right-hand side of the assignment is.
for (i = 1; i <= n; i++) uniq[array[i]] = i
# now, it's easy to iterate over the unique keys
for (elem in uniq) print elem
}
无保证顺序的输出:
a
b
c
d
e
如果您使用GNU awk,请使用PROCINFO["sorted_in"]
来控制数组遍历的排序