AWK:使用两个匹配条件的文件和计数来获得相等的唯一标识符



我正在处理两个文件中与帐号匹配的两个数据。并具有一个或多个条件,该条件具有第二个文件中的值和和和该条件匹配的计数。

第一个文件是固定长度的,账号从1到8:

68541561        12531563     20211205154331……NN061
68541562        12531563     20211205154332……NN061
68541563        12531563     20211205154333……NN000
68541564        12531563     20211205154334……NN061
68541565        12531563     20211205154335……NN000
68541566        12531563     20211205154336……NN061

第二个文件是逗号分隔的,结构类似于定义文件,但有重复的记录。

68541561,Customer Proc 1
68541565,Answer
68541561,Customer Proc 1
68541562,Customer Proc 1
68541561,Customer Proc 1
68541563,Answer
68541562,Customer Proc 1
68541564,Customer Proc 1
68541565,Answer 
68541564,Customer Proc 1
68541565,Answer
68541561,Customer Proc 1
68541562,Customer Proc 1
68541563,Answer

预期输出,它将添加来自第一个文件的计数:

68541561        12531563     20211205154331……NN0614
68541562        12531563     20211205154332……NN0613
68541563        12531563     20211205154333……NN0002
68541564        12531563     20211205154334……NN0612
68541565        12531563     20211205154335……NN0003
68541566        12531563     20211205154336……NN0610

我确实有一个正在处理的脚本,但它只显示计数1,似乎只读取第一个文件。

awk-f test.awk pass=0 testfile2.dat pass=1 testfile.txt

BEGIN{
}
pass==0{
ACT=substr($1)
RES[ACT]=$2
}
pass==1{
FS=","
ACT=substr($0,1,8)

##LIST[ACT]=RESCODE
LIST[ACT]=ACT

if((RES[ACT]=="Customer Proc 1")){ OTHCUST1++ }
if((RES[ACT]=="Customer Proc 2")){ OTHCUST2++ }
if((RES[ACT]=="Customer Proc 3")){ OTHCUST3++ }
if((RES[ACT]=="Customer Proc 4")){ OTHCUST4++ }
if((RES[ACT]=="Answer")){  OTHANSW++ }
if((RES[ACT]=="Busy")){ OTHBUSY++ }
if((RES[ACT]=="Hang Up")){ OTHAM++ }

}
END{
for (nmb in LIST) {
printf "%1378s|", $0             >> "OUTFILE"
printf "%s", OTHCUST1            >> "OUTFILE"
printf "%s", OTHCUST2            >> "OUTFILE"
printf "%s", OTHCUST3            >> "OUTFILE"
printf "%s", OTHCUST4            >> "OUTFILE"
printf "%s", OTHANSW             >> "OUTFILE"
printf "%s", OTHBUSY             >> "OUTFILE"
printf "%s", OTHAM               >> "OUTFILE"
}
}

你的问题还不完全清楚,但我认为这就是你想要做的:

$ awk -F'[,[:space:]]+' 'NR==FNR{cnt[$1]++; next} {print $0 cnt[$1]+0}' file2 file1
68541561        12531563     20211205154331……NN0614
68541562        12531563     20211205154332……NN0613
68541563        12531563     20211205154333……NN0002
68541564        12531563     20211205154334……NN0612
68541565        12531563     20211205154335……NN0003
68541566        12531563     20211205154336……NN0610

顺便说一句,在发布的代码中执行FS=","已经太晚了——在读取第一行输入之前,第一行输入已经被读取并拆分为字段。一种方法是将FS=","更改为if (FNR==1) { FS=","; $0=$0 },以便在设置FS后获得awk来重新拆分记录,但这效率很低,所以我将其封装在if中,只对读取的第一行执行此操作,因为在此之后,FS现在在读取第二行和后续行之前设置。

代码中的所有printf "%s"s都应该是printf "%d"s btw,否则当设置计数变量的条件没有满足时,您将得到空字符串而不是零。

使用您显示的示例,请尝试以下awk代码。

awk '
FNR==NR{
arr[$1]++
next
}
($1 in arr){
print $0 arr[$1]
delete arr[$1]
}
' FS="," file2 FS=" " file1

解释:添加对上述代码的详细解释。

awk '                        ##Starting awk program from here.
FNR==NR{                     ##Checking condition which will be TRUE when file2 is being read.
arr[$1]++                  ##Creating array with name arr with index of $1.
next                       ##next will skip statements from here.
}
($1 in arr){                 ##Checking condition if $1 is present in arr.
print   $0 arr[$1]         ##printing current line here with arr[$1].
delete arr[$1]             ##Deleting arr entry with $1 here.
}
' FS="," file2 FS=" " file1  ##Set FS as comma for file2 and space for file1 and pass Input_files too here.

最新更新