从 bash 循环中收集唯一数据项的数组 - 读取数组不追加?



我的任务是处理一个文本文件,以便使用 Bash 仅检索相关详细信息。以下是文本文件的示例内容:

Jul 21 09:29:10 serverbkp dhcpd: DHCPDISCOVER from aa:bb:cc:dd:ee:ff via 1.2.3.188: peer holds all free leases
Jul 21 09:29:10 serverbkp dhcpd: DHCPDISCOVER from aa:bb:cc:dd:ee:ff via 1.2.3.189: peer holds all free leases
Jul 21 09:29:10 serverbkp dhcpd: DHCPDISCOVER from aa:bb:cc:dd:ee:gg via eth0: network 1.2.64.0/24: no free leases
Jul 21 09:29:10 serverbkp dhcpd: DHCPDISCOVER from aa:bb:cc:dd:ee:gg via eth0: network 1.2.65.0/24: no free leases

我尝试阅读每一行,测试它包含字符串的条件peer 持有所有没有免费租约。基于字符串(包含的行),我将通过检索字符串的一部分并将其推送到数组中来进一步处理它。

while IFS= read -r line;
do
if [[ $line == *"peer holds all"* ]]; then
readarray -t peer_holds_array < <(echo "${line}" | awk '{print $10}' | sed -e 's/:$//g')
elif [[ $line == *"no free leases"* ]]; then
readarray -t no_free_leases_array < <(echo "${line}" | awk '{print $12}' | sed -e 's/:$//g')
fi
done < <(grep -i "peer holds all|no free leases" daemon.log)
peer_holds_uniq=($(printf "%sn" "${peer_holds_array[@]}" | sort -u))
no_free_lease_uniq=($(printf "%sn" "${no_free_lease_array[@]}" | sort -u))
printf "Peer Holds Leases - Via:n"
printf "${peer_holds_uniq[@]}n"
printf "No Free Leases:n"
printf "${no_free_lease_uniq[@]}n"

预期成果:

Peer Holds Leases - Via:
1.2.3.188
1.2.3.189
No Free Leases:
1.2.64.0/24
1.2.65.0/24

实际结果:

Peer Holds Leases - Via:
1.2.3.188
No Free Leases:
1.2.64.0/24

一个有效的实现可能如下所示:

#!/usr/bin/env bash
case $BASH_VERSION in ''|[1-3]*) echo "ERROR: Bash 4.0 or newer is needed" >&2; exit 1;; esac
generate_input() {  # so this can be run by people without your real input file
cat <<'EOF'
Jul 21 09:29:10 serverbkp dhcpd: DHCPDISCOVER from aa:bb:cc:dd:ee:ff via 1.2.3.188: peer holds all free leases
Jul 21 09:29:10 serverbkp dhcpd: DHCPDISCOVER from aa:bb:cc:dd:ee:ff via 1.2.3.189: peer holds all free leases
Jul 21 09:29:10 serverbkp dhcpd: DHCPDISCOVER from aa:bb:cc:dd:ee:gg via eth0: network 1.2.64.0/24: no free leases
Jul 21 09:29:10 serverbkp dhcpd: DHCPDISCOVER from aa:bb:cc:dd:ee:gg via eth0: network 1.2.65.0/24: no free leases
EOF
}
set -x # enable debug logging
peer_holds_re=' via ([[:digit:].]+): peer holds all'     # define regular expressions
no_free_leases_re='network ([[:digit:]/.]+): no free leases'
declare -A peer_holds_array=( ) no_free_lease_array=( )  # initialize associative arrays
while IFS= read -r line; do
if [[ $line =~ $peer_holds_re ]]; then               # testing [[ $string =~ $re ]]
peer_holds_array[${BASH_REMATCH[1]}]=1           # ...sets ${BASH_REMATCH[@]} array
elif [[ $line =~ $no_free_leases_re ]]; then
no_free_lease_array[${BASH_REMATCH[1]}]=1
fi
done < <(generate_input | grep -Ei "peer holds all|no free leases")
printf "Peer Holds Leases - Via:n"
printf '%sn' "${!peer_holds_array[@]}"
printf "No Free Leases:n"
printf '%sn' "${!no_free_lease_array[@]}"
  • 使用bash的内置正则表达式支持([[ $string =~ $regex ]])让我们不必担心一行分成多少个字段;它也比为每行输入启动echo | awk | sed管道快数百倍。
  • 我们切换到对数据使用关联数组的键,因为这些键本质上是唯一的。在这里,实际数据是键,与它们关联的数据只是设置为占位符常量(在本例中为1)。
  • readarray覆盖整个目标数组,因此不能将其用于增量添加;array+=( "first item to append" "second item to append" )用于常规数组;或者在这里,我们在关联数组中设置键,array["item to set"]=1
  • printf需要一个格式字符串,它为满足该字符串中占位符的每组参数重复该字符串。因此,printf '%sn' 'First line' 'Second line'First line替换为%sn中的%s,并再次重复Second line

您可以看到它运行在 https://ideone.com/GmZYrV

对于使用常规数组的版本,请参阅此答案的编辑历史记录。

FWIW 以下是我的做法,使用 GNU awk for gensub() 和 sorted_in:

$ cat tst.awk
{ addr = gensub(/.* ([^:]+):.*$/,"\1",1) }
/peer holds all/ { peers[addr] }
/no free leases/ { frees[addr] }
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
print "Peer Holds Leases - Via:"
for (addr in peers) {
print addr
}
print "No Free Leases:"
for (addr in frees) {
print addr
}
}
$ awk -f tst.awk file
Peer Holds Leases - Via:
1.2.3.188
1.2.3.189
No Free Leases:
1.2.64.0/24
1.2.65.0/24

最新更新