awk数组平均值



使用AWK(没有别的,我正在尝试学习AWK),我想评估服务器发送到特定IP的数据是否比其他服务器少。

日志包含:

  • 日志时间:
    上午01时至晚上24时(当日)
  • 服务器名称
  • 服务器在特定时隙内到达的IP。
  • 该时间段内服务器到达该IP的时间数。
  • 输入:

$ cat Iplogs.txt  
Time,Source,destinationIP,Count  
11,server1,123.12.23.122,10  
11,server1,125.25.45.221,153  
11,server1,202.178.23.4,44  
11,server2,123.12.23.122,300  
11,server2,125.25.45.221,140  
11,server2,202.178.23.4,41  
12,server1,123.12.23.122,0  
12,server1,125.25.45.221,153  
12,server1,202.178.23.4,44  
12,server2,123.12.23.122,300  
12,server2,125.25.45.221,140  
12,server2,202.178.23.4,41

预期结果:

server1,125.25.45.221,306,52.21%      #306/586*100
server2 125.25.45.221,47.78%      #280/586*100
server1 202.178.23.4,51.76%      #88/170*100
server1 123.12.23.122,1.63%      #10/610*100
server2 202.178.23.4,48.23%      #82/170*100
server2 123.12.23.122,98.36%      #600/610*100

=比;1.63%等于从server1到123.12.23.122的流量除以从server1和从server2到123.12.23.122全天的流量总和,乘以100

到目前为止我做了什么:这个命令给出了每个IP的累计计数:

$ awk -F"," '{IP[$3];MAX[$3]+=$4} END {for(i in IP) print i," ",IP[i]," ",MAX[i]}' Iplogs.txt
123.12.23.122      610
125.25.45.221      586
202.178.23.4      170

这个命令给出了每个服务器的累计计数,按IP到达:

$ awk -F"," '{SRVbyIP[$2" "$3];COUNT[$2" "$3]+=$4} END {for(j in SRVbyIP) print j," ",SRVbyIP[j]," ",COUNT[j]}' Iplogs.txt | sort
server1 125.25.45.221      306
server2 125.25.45.221      280
server1 202.178.23.4      88
server1 123.12.23.122      10
server2 202.178.23.4      82
server2 123.12.23.122      600

…但是我不能设法找到一种方法来划分COUNT[j]/MAX[I]

假设/理解:

  • 需要对每个不同IP的Count's求和
  • 需要对每个不同主机/IP对的Count's求和
  • 输出应该是每个不同的主机/IP对的列表,以及主机/IP对的总和,以及主机/IP总和除以相关IP的总和的结果
  • 输出按主机排序,然后按IP
  • 排序
  • 我们将让awk执行正常舍入到小数点后2位(例如,对于1.6393的结果,我们应该打印1.64);如果OP需要截断(例如,1.6393变成1.63),那么我们需要对代码
  • 做一个小调整。

一个awk方法:

awk '
BEGIN   { FS=OFS="," }                                         # define input/output field delimiters as a comma
FNR==1  { next }                                               # skip header line
{ hosts[$2]                                            # maintain list of hosts
ip_sums[$3]+=$4                                      # sum up Counts by ip ($3)
host_sums[$2,$3]+=$4                                 # sum up Counts by host ($2) and ip ($3)
}
END     { for (host in hosts)                                  # loop through list of hosts
for (ip in ip_sums) {                            # loop through list of ips for a given host
if (! ((host,ip) in host_sums)) continue     # if no entry in host_sums[] for this host/ip pair then skip to next interation of loop
if (ip_sums[ip]==0)                          # if the sum for this ip is zero then address "divide by zero" error by ...
pct="0.00"                                # hardcoding the percent as 0.00
else {                                       # calculate percentage; uncommented line == "rounded"; commented line == "truncated"
pct=sprintf("%0.2f", host_sums[host,ip]*100/ip_sums[ip])
#                    pct=sprintf("%0.2f", int(host_sums[host,ip]*10000/ip_sums[ip]) /100)
}
print host,ip,host_sums[host,ip],pct "%"
}
}
' Iplogs.txt | sort -t',' -V -k1,1 -k2,2

由此产生:

server1,123.12.23.122,10,1.64%
server1,125.25.45.221,306,52.22%
server1,202.178.23.4,88,51.76%
server2,123.12.23.122,600,98.36%
server2,125.25.45.221,280,47.78%
server2,202.178.23.4,82,48.24%

先IP后主机(sort -t',' -V -k2,2 -k1,1):

server1,123.12.23.122,10,1.64%
server2,123.12.23.122,600,98.36%
server1,125.25.45.221,306,52.22%
server2,125.25.45.221,280,47.78%
server1,202.178.23.4,88,51.76%
server2,202.178.23.4,82,48.24%

相关内容

  • 没有找到相关文章