对列进行分类,并计算列中警告的数量

  • 本文关键字:警告 计算 分类 awk
  • 更新时间 :
  • 英文 :


我有一个名为out.txt的文件,如下所示:

Statement 1                        Statement 2  Statement 3    Statement 4
The declaration is not done         /   Exp     /   *       /  This is expected
The declaration is starting/started /   St      /   *       /  This is not expected
The declaration is not yet designed /   Yt      /   &       /  This is a major one
The declaration is confirmed        /   Exp     /   *       /  This is okay
The declaration is not confirmed    /   Ntp     /   &       /  This is a major issue

我需要从第3列(语句3(中总结和分类,如果它是*,则为警告,如果它为&这是一个错误,如下所示:

Out:
Warnings:
Exp : 2
St  : 1
Total : 3
Errors:
Yt : 1
Ntp: 1
Total :2

我尝试了以下代码,但没有得到确切的输出:

#!/bin/bash
echo " " ;
File="out.txt"
for z in out.txt;
do
if grep -q "&" $z/"$File"; then
echo "$z:";
awk -F' / ' 
{ a[$2]++ }
END{ for(j in a){ print j, a[j]; s=s+a[j] };
print "Total :", s}' out.txt
else 
echo "$z:";
done

EDIT2: 由于OP确认没有针对错误的关键字,因此应该由行倒数第二个字段中的&关键字决定,然后尝试以下操作。

awk -F'/' '
match($0,/[[:space:]]+/[^/]*[[:space:]]+//){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+|//,"",val)
str=$(NF-1)
gsub(/ +/,"",str)
if(str=="&"){
countEr[val]++
}
else{
countSu[val]++
}
val=str=""
}
END{
print "Out:" ORS "Warings:"
for(i in countSu){
print "t"i,countSu[i]
sumSu+=countSu[i]
}
print "Total:"sumSu
print "Errors:"
for(i in countEr){
print "t"i,countEr[i]
sumEr+=countEr[i]
}
print "Total:"sumEr
}' Input_file


编辑:通用解决方案,可以在变量中给出所有错误的名称,然后我们不需要像我以前的解决方案那样手动设置所有条件。请您尝试以下内容,基于您显示的仅使用GNUawk编写和测试的示例。

awk -v errors="Ntp,Yt"  '
BEGIN{
num=split(errors,arr,",")
for(i=1;i<=num;i++){
errorVal[arr[i]]
}
}
match($0,/[[:space:]]+/[^/]*[[:space:]]+//){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+|//,"",val)
if(val in errorVal){
countEr[val]++
}
else{
countSu[val]++
}
val=""
}
END{
print "Out:" ORS "Warings:"
for(i in countSu){
print "t"i,countSu[i]
sumSu+=countSu[i]
}
print "Total:"sumSu
print "Errors:"
for(i in countEr){
print "t"i,countEr[i]
sumEr+=countEr[i]
}
print "Total:"sumEr
}'  Input_file

解释:添加以上详细解释。

awk '                                                 ##Starting awk program from here.
match($0,/[[:space:]]+/[^/]*[[:space:]]+//){        ##Using match function to match space slash space and slash here as per samples to get value.
val=substr($0,RSTART,RLENGTH)                       ##Saving sub-string into variable val from RSTART to RLENGTH here.
gsub(/[[:space:]]+|//,"",val)                      ##Removing spaces and slashes with NULL in val here.
if(val=="Ntp" || val=="Yt"){                        ##Checking condition if value is either Ntp PR Yt then do following.
countEr[val]++                                   ##Increase count for array countEr with 1 with index of val here.
}
else{                                               ##Else do following.
countSu[val]++                                   ##Increase count of array countSu with index of val here.
}
val=""                                              ##Nullifying val here.
}
END{                                                  ##Starting END block of this program here.
print "Out:" ORS "Warnings:"                        ##Printing string Out new line and Warnings here.
for(i in countSu){                                  ##Traversing through countSu here.
print "t"i,countSu[i]                           ##Printing tab index of array and value of CountSu here.
sumSu+=countSu[i]                                ##Keep on adding value of countSu current item into sumSu variable here.
}
print "Total:"sumSu                                 ##Printing Total string with sumSu value here.
print "Errors:"                                     ##Printing string Errors here.
for(i in countEr){                                  ##Traversing through countEr here.
print "t"i,countEr[i]                           ##Printing tab index i and countEr value here.
sumEr+=countEr[i]                                ##Keep on adding value of countEr current item into sumEr variable here.
}
print "Total:"sumEr                                 ##Printing Total string with sumEr value here.
}'  Input_file                                        ##Mentioning Input_file name here.

另一个gawk替代方案-依赖于gawk的"真正的多维数组":$ cat tst.awk:

BEGIN {
FS="[[:blank:]]/[[:blank:]]"
OFS=" : "
}
FNR>1{
gsub(/[[:blank:]]/, "", $2)
gsub(/[[:blank:]]/, "", $3)
a[$3][$2]++
}
END {
#PROCINFO["sorted_in"]="@ind_str_desc"
print "Out" OFS
for(i in a) {
print (i=="*"?"Warnings":"Errors") OFS
t=0
for(j in a[i]) {
print "t" j, a[i][j]
t+=a[i][j]
}
print "Total", t
t=0
}
}

gawk -tst.awk myFile导致:

Out :
Warnings :
St : 1
Exp : 2
Total : 3
Errors :
Ntp : 1
Yt : 1
Total : 2

最新更新