使用AWK(没有别的,我正在尝试学习AWK),我想评估服务器发送到特定IP的数据是否比其他服务器少。
日志包含:
- 日志时间:
上午01时至晚上24时(当日) - 服务器名称
- 服务器在特定时隙内到达的IP。
- 该时间段内服务器到达该IP的时间数。 输入:
$ cat Iplogs.txt
Time,Source,destinationIP,Count
11,server1,123.12.23.122,10
11,server1,125.25.45.221,153
11,server1,202.178.23.4,44
11,server2,123.12.23.122,300
11,server2,125.25.45.221,140
11,server2,202.178.23.4,41
12,server1,123.12.23.122,0
12,server1,125.25.45.221,153
12,server1,202.178.23.4,44
12,server2,123.12.23.122,300
12,server2,125.25.45.221,140
12,server2,202.178.23.4,41
预期结果:
server1,125.25.45.221,306,52.21% #306/586*100
server2 125.25.45.221,47.78% #280/586*100
server1 202.178.23.4,51.76% #88/170*100
server1 123.12.23.122,1.63% #10/610*100
server2 202.178.23.4,48.23% #82/170*100
server2 123.12.23.122,98.36% #600/610*100
=比;1.63%等于从server1到123.12.23.122的流量除以从server1和从server2到123.12.23.122全天的流量总和,乘以100
到目前为止我做了什么:这个命令给出了每个IP的累计计数:
$ awk -F"," '{IP[$3];MAX[$3]+=$4} END {for(i in IP) print i," ",IP[i]," ",MAX[i]}' Iplogs.txt
123.12.23.122 610
125.25.45.221 586
202.178.23.4 170
这个命令给出了每个服务器的累计计数,按IP到达:
$ awk -F"," '{SRVbyIP[$2" "$3];COUNT[$2" "$3]+=$4} END {for(j in SRVbyIP) print j," ",SRVbyIP[j]," ",COUNT[j]}' Iplogs.txt | sort
server1 125.25.45.221 306
server2 125.25.45.221 280
server1 202.178.23.4 88
server1 123.12.23.122 10
server2 202.178.23.4 82
server2 123.12.23.122 600
…但是我不能设法找到一种方法来划分COUNT[j]/MAX[I]
假设/理解:
- 需要对每个不同IP的
Count's
求和 - 需要对每个不同主机/IP对的
Count's
求和 - 输出应该是每个不同的主机/IP对的列表,以及主机/IP对的总和,以及主机/IP总和除以相关IP的总和的结果
- 输出按主机排序,然后按IP 排序
- 我们将让
awk
执行正常舍入到小数点后2位(例如,对于1.6393
的结果,我们应该打印1.64
);如果OP需要截断(例如,1.6393
变成1.63
),那么我们需要对代码 做一个小调整。
一个awk
方法:
awk '
BEGIN { FS=OFS="," } # define input/output field delimiters as a comma
FNR==1 { next } # skip header line
{ hosts[$2] # maintain list of hosts
ip_sums[$3]+=$4 # sum up Counts by ip ($3)
host_sums[$2,$3]+=$4 # sum up Counts by host ($2) and ip ($3)
}
END { for (host in hosts) # loop through list of hosts
for (ip in ip_sums) { # loop through list of ips for a given host
if (! ((host,ip) in host_sums)) continue # if no entry in host_sums[] for this host/ip pair then skip to next interation of loop
if (ip_sums[ip]==0) # if the sum for this ip is zero then address "divide by zero" error by ...
pct="0.00" # hardcoding the percent as 0.00
else { # calculate percentage; uncommented line == "rounded"; commented line == "truncated"
pct=sprintf("%0.2f", host_sums[host,ip]*100/ip_sums[ip])
# pct=sprintf("%0.2f", int(host_sums[host,ip]*10000/ip_sums[ip]) /100)
}
print host,ip,host_sums[host,ip],pct "%"
}
}
' Iplogs.txt | sort -t',' -V -k1,1 -k2,2
由此产生:
server1,123.12.23.122,10,1.64%
server1,125.25.45.221,306,52.22%
server1,202.178.23.4,88,51.76%
server2,123.12.23.122,600,98.36%
server2,125.25.45.221,280,47.78%
server2,202.178.23.4,82,48.24%
先IP后主机(sort -t',' -V -k2,2 -k1,1
):
server1,123.12.23.122,10,1.64%
server2,123.12.23.122,600,98.36%
server1,125.25.45.221,306,52.22%
server2,125.25.45.221,280,47.78%
server1,202.178.23.4,88,51.76%
server2,202.178.23.4,82,48.24%