文件中每分钟的Awk计数平均值



假设我有这样一个文件:

13.03.2013 12:13:01 | STRING1 | NUMBER1 | 1 | NUMBER3
12 13.03.2013 12:13:08 | STRING1 | NUMBER1 | | NUMBER3
13 13.03.2013 12:13:09 | STRING3 | NUMBER1 | | NUMBER3
21 13.03.2013 12:13:12 | STRING2相等| NUMBER1 | | NUMBER3
13.03.2013 12:13:15 | STRING2相等| NUMBER1 | 11 | NUMBER3
13 13.03.2013 12:13:18 | STRING1 | NUMBER1 | | NUMBER3
21 13.03.2013 12:13:20 | STRING2相等| NUMBER1 | | NUMBER3
51 13.03.2013 12:13:25 | STRING3 | NUMBER1 | | NUMBER3
13.03.2013 12:13:38 | STRING2相等| NUMBER1 | 71 | NUMBER3
21 13.03.2013 12:13:40 | STRING1 | NUMBER1 | | NUMBER3
13.03.2013 12:13:42 | STRING1 | NUMBER1 | 11 | NUMBER3
13.03.2013 12:13:55 | STRING3 | NUMBER1 | 71 | NUMBER3
13.03.2013 12:14:02 | STRING1 | NUMBER1 | 11 | NUMBER3
13 13.03.2013 12:14:07 | STRING1 | NUMBER1 | | NUMBER3
13 13.03.2013 12:14:08 | STRING3 | NUMBER1 | | NUMBER3
21 13.03.2013 12:14:15 | STRING2相等| NUMBER1 | | NUMBER3
13.03.2013 12:14:16 | STRING2相等| NUMBER1 | 11 | NUMBER3
13.03.2013 12:14:16 | STRING1 | NUMBER1 | 1 | NUMBER3
21 13.03.2013 12:14:20 | STRING2相等| NUMBER1 | | NUMBER3
51 13.03.2013 12:14:25 | STRING3 | NUMBER1 | | NUMBER3
13.03.2013 12:14:37 | STRING2相等| NUMBER1 | 71 | NUMBER3
13.03.2013 12:14:42 | STRING1 | NUMBER1 | 1 | NUMBER3
13.03.2013 12:14:45 | STRING1 | NUMBER1 | 11 | NUMBER3
51 13.03.2013 12:14:58 | STRING3 | NUMBER1 | | NUMBER3
13.03.2013 12:15:06 | STRING2相等| NUMBER1 | 11 | NUMBER3
43 13.03.2013 12:15:13 | STRING1 | NUMBER1 | | NUMBER3
21 13.03.2013 12:15:22 | STRING2相等| NUMBER1 | | NUMBER3
51 13.03.2013 12:15:26 | STRING3 | NUMBER1 | | NUMBER3
13.03.2013 12:15:35 | STRING2相等| NUMBER1 | 71 | NUMBER3
13.03.2013 12:15:40 | STRING1 | NUMBER1 | 1 | NUMBER3
21 13.03.2013 12:15:42 | STRING1 | NUMBER1 | | NUMBER3
13.03.2013 12:15:53 | STRING3 | NUMBER1 | 71 | NUMBER3

我想找到第4列(在第三个|之后)每分钟的平均值,仅针对变量X。例如,如果$X="STRING1",结果应该是:

13.03.2013 12:13 | STRING1 | 11.6
13.03.2013 12:14 | STRING1 | 7.4
13.03.2013 12:15 | STRING1 | 21.666

因此,我们用变量$X查找每分钟的线,并计算这些线的平均值。如何处理?

您可以使用以下awk程序:

example.awk :

$0 ~ SEARCH {
  split($1,time,":")
  min=time[2]
  total[min]+=$4
  count[min]++
  ts[min]=time[1]":"time[2]
}
END{
  for(m in total){
    printf "%s|%s|%sn", ts[m],SEARCH,total[m]/count[m]
  }
}

执行它:

awk -F'|' -v SEARCH=STRING1 -f example.awk your.log
输出:

13.03.2013 12:13|STRING1|11.6
13.03.2013 12:14|STRING1|7.4
13.03.2013 12:15|STRING1|21.6667
awk -v X="STRING1" '
    BEGIN { FS = OFS = "|" }
    $2 != X {next} 
    {min = substr($1,1,16)} 
    min != prev {
        if (NR>1) print prev, X, total/n
        total = n = 0
        prev = min
    } 
    {n++; total += $4} 
    END {print prev, X, total/n}
' file

最新更新