我有一个名为out.txt的文件,如下所示:
Statement 1 Statement 2 Statement 3 Statement 4
The declaration is not done / Exp / * / This is expected
The declaration is starting/started / St / * / This is not expected
The declaration is not yet designed / Yt / & / This is a major one
The declaration is confirmed / Exp / * / This is okay
The declaration is not confirmed / Ntp / & / This is a major issue
我需要从第3列(语句3(中总结和分类,如果它是*,则为警告,如果它为&这是一个错误,如下所示:
Out:
Warnings:
Exp : 2
St : 1
Total : 3
Errors:
Yt : 1
Ntp: 1
Total :2
我尝试了以下代码,但没有得到确切的输出:
#!/bin/bash
echo " " ;
File="out.txt"
for z in out.txt;
do
if grep -q "&" $z/"$File"; then
echo "$z:";
awk -F' / '
{ a[$2]++ }
END{ for(j in a){ print j, a[j]; s=s+a[j] };
print "Total :", s}' out.txt
else
echo "$z:";
done
EDIT2:
由于OP确认没有针对错误的关键字,因此应该由行倒数第二个字段中的&
关键字决定,然后尝试以下操作。
awk -F'/' '
match($0,/[[:space:]]+/[^/]*[[:space:]]+//){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+|//,"",val)
str=$(NF-1)
gsub(/ +/,"",str)
if(str=="&"){
countEr[val]++
}
else{
countSu[val]++
}
val=str=""
}
END{
print "Out:" ORS "Warings:"
for(i in countSu){
print "t"i,countSu[i]
sumSu+=countSu[i]
}
print "Total:"sumSu
print "Errors:"
for(i in countEr){
print "t"i,countEr[i]
sumEr+=countEr[i]
}
print "Total:"sumEr
}' Input_file
编辑:通用解决方案,可以在变量中给出所有错误的名称,然后我们不需要像我以前的解决方案那样手动设置所有条件。请您尝试以下内容,基于您显示的仅使用GNUawk
编写和测试的示例。
awk -v errors="Ntp,Yt" '
BEGIN{
num=split(errors,arr,",")
for(i=1;i<=num;i++){
errorVal[arr[i]]
}
}
match($0,/[[:space:]]+/[^/]*[[:space:]]+//){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+|//,"",val)
if(val in errorVal){
countEr[val]++
}
else{
countSu[val]++
}
val=""
}
END{
print "Out:" ORS "Warings:"
for(i in countSu){
print "t"i,countSu[i]
sumSu+=countSu[i]
}
print "Total:"sumSu
print "Errors:"
for(i in countEr){
print "t"i,countEr[i]
sumEr+=countEr[i]
}
print "Total:"sumEr
}' Input_file
解释:添加以上详细解释。
awk ' ##Starting awk program from here.
match($0,/[[:space:]]+/[^/]*[[:space:]]+//){ ##Using match function to match space slash space and slash here as per samples to get value.
val=substr($0,RSTART,RLENGTH) ##Saving sub-string into variable val from RSTART to RLENGTH here.
gsub(/[[:space:]]+|//,"",val) ##Removing spaces and slashes with NULL in val here.
if(val=="Ntp" || val=="Yt"){ ##Checking condition if value is either Ntp PR Yt then do following.
countEr[val]++ ##Increase count for array countEr with 1 with index of val here.
}
else{ ##Else do following.
countSu[val]++ ##Increase count of array countSu with index of val here.
}
val="" ##Nullifying val here.
}
END{ ##Starting END block of this program here.
print "Out:" ORS "Warnings:" ##Printing string Out new line and Warnings here.
for(i in countSu){ ##Traversing through countSu here.
print "t"i,countSu[i] ##Printing tab index of array and value of CountSu here.
sumSu+=countSu[i] ##Keep on adding value of countSu current item into sumSu variable here.
}
print "Total:"sumSu ##Printing Total string with sumSu value here.
print "Errors:" ##Printing string Errors here.
for(i in countEr){ ##Traversing through countEr here.
print "t"i,countEr[i] ##Printing tab index i and countEr value here.
sumEr+=countEr[i] ##Keep on adding value of countEr current item into sumEr variable here.
}
print "Total:"sumEr ##Printing Total string with sumEr value here.
}' Input_file ##Mentioning Input_file name here.
另一个gawk替代方案-依赖于gawk的"真正的多维数组":$ cat tst.awk:
BEGIN {
FS="[[:blank:]]/[[:blank:]]"
OFS=" : "
}
FNR>1{
gsub(/[[:blank:]]/, "", $2)
gsub(/[[:blank:]]/, "", $3)
a[$3][$2]++
}
END {
#PROCINFO["sorted_in"]="@ind_str_desc"
print "Out" OFS
for(i in a) {
print (i=="*"?"Warnings":"Errors") OFS
t=0
for(j in a[i]) {
print "t" j, a[i][j]
t+=a[i][j]
}
print "Total", t
t=0
}
}
gawk -tst.awk myFile
导致:
Out :
Warnings :
St : 1
Exp : 2
Total : 3
Errors :
Ntp : 1
Yt : 1
Total : 2