这是我的awk脚本filtered.awk
,它适用于单个输入文件。
#Field Seperator
BEGIN { FS="[,:"]" }
#Searching and Storing in an Array
/searchKeyword/ {a[$5]=a[$5]OFS$6}
#Looping on Array
END {
for (k in a)
{
print FILENAME, k, gsub(OFS,OFS,a[k]) > ("output_" FILENAME)
}
}
样本输入-
cat input1.txt
"YY/XX","searchKeyword-ZZZZ.abc:06","200OK",64594889937362
"YY/XX","searchKeyword-ZZZZ.abc:13","200OK",64594860937362
"YY/XX","searchKeyword-ZZZZ.abc:06","200OK",64594822937362
"YY/XX","searchKeyword-ZZZZ.abc:06","200OK",64594823937362
"YY/XX","searchKeyword-ZZZZ.pqr:13","200OK",64594890937362
"YY/XX","searchKeyword-ZZZZ.pqr:08","200OK",64594877937362
"YY/XX","searchKeyword-ZZZZ.pqr:13","200OK",64594860937362
"YY/XX","searchKeyword-ZZZZ.pqr:13","200OK",64594870937362
"YY/XX","searchKeyword-ZZZZ.cde:12","200OK",64594803937362
"YY/XX","searchKeyword-ZZZZ.cde:00","200OK",64594870937362
"YY/XX","searchKeyword-ZZZZ.cde:00","200OK",64594860937362
"YY/XX","searchKeyword-ZZZZ.cde:08","200OK",64594825193736
第二个输入文件 -
cat input2.txt
"XXX/YYY","searchKeyword-YYYYY.pqr:99910","200OK",439865231,"4334373212"
"XXX/YYY","searchKeyword-YYYYY.cde:99904","200OK",439868231,"4334953212"
"XXX/YYY","searchKeyword-YYYYY.mno:99909","200OK",439827231,"4334178212"
"XXX/YYY","searchKeyword-YYYYY.pqr:99911","200OK",439874231,"4334353212"
"XXX/YYY","searchKeyword-YYYYY.cde:99900","200OK",439893231,"4334130212"
"XXX/YYY","searchKeyword-YYYYY.mno:99910","200OK",439886231,"4334868212"
"XXX/YYY","searchKeyword-YYYYY.pqr:99905","200OK",439850231,"4334495212"
"XXX/YYY","searchKeyword-YYYYY.cde:99905","200OK",439878231,"4334131212"
"XXX/YYY","searchKeyword-YYYYY.mno:99910","200OK",439871231,"4334895212"
"XXX/YYY","searchKeyword-YYYYY.pqr:99910","200OK",439874231,"4334353212"
"XXX/YYY","searchKeyword-YYYYY.cde:99908","200OK",439848231,"4334823212"
"XXX/YYY","searchKeyword-YYYYY.mno:99914","200OK",439820231,"4334177212"
"XXX/YYY","searchKeyword-YYYYY.pqr:99910","200OK",439882231,"4334579212"
"XXX/YYY","searchKeyword-YYYYY.cde:99903","200OK",439840231,"4334966212"
"XXX/YYY","searchKeyword-YYYYY.mno:99908","200OK",439894231,"4334365212"
第三个输入文件
cat input3.txt
"XXX/YYY","searchKeyword-YYYYY.cde:99900","200OK",439893231,"4334130212"
"XXX/YYY","searchKeyword-YYYYY.mno:99910","200OK",439886231,"4334868212"
"XXX/YYY","searchKeyword-YYYYY.pqr:99905","200OK",439850231,"4334495212"
"XXX/YYY","searchKeyword-YYYYY.cde:99905","200OK",439878231,"4334131212"
"XXX/YYY","searchKeyword-YYYYY.mno:99910","200OK",439871231,"4334895212"
"XXX/YYY","searchKeyword-YYYYY.pqr:99910","200OK",439874231,"4334353212"
"PPP/QQQ","searchKeyword-ZZZZ.abc:06","200OK",64594822937362
"PPP/QQQ","searchKeyword-ZZZZ.abc:06","200OK",64594823937362
"PPP/QQQ","searchKeyword-ZZZZ.pqr:13","200OK",64594890937362
"PPP/QQQ","searchKeyword-ZZZZ.pqr:08","200OK",64594877937362
"PPP/QQQ","searchKeyword-ZZZZ.pqr:13","200OK",64594860937362
"PPP/QQQ","searchKeyword-ZZZZ.pqr:13","200OK",64594870937362
"PPP/QQQ","searchKeyword-ZZZZ.cde:12","200OK",64594803937362
"PPP/QQQ","searchKeyword-ZZZZ.cde:00","200OK",64594870937362
我传递了如下所示的输入文件,并在output_input3.txt
文件中获得了输出。
awk -f filtered.awk input*
cat output_input3.txt
input3.txt searchKeyword-ZZZZ.cde 6
input3.txt searchKeyword-YYYYY.cde 7
input3.txt searchKeyword-ZZZZ.pqr 8
input3.txt searchKeyword-YYYYY.pqr 7
input3.txt searchKeyword-ZZZZ.abc 6
input3.txt searchKeyword-YYYYY.mno 7
看起来它根本没有处理前两个文件。
我期待动态生成的文件中的输出,如下所示 -
==> output_input1.txt <==
input1.txt searchKeyword-ZZZZ.cde 4
input1.txt searchKeyword-ZZZZ.pqr 4
input1.txt searchKeyword-ZZZZ.abc 4
==> output_input2.txt <==
input2.txt searchKeyword-YYYYY.cde 5
input2.txt searchKeyword-YYYYY.pqr 5
input2.txt searchKeyword-YYYYY.mno 5
==> output_input3.txt <==
input3.txt searchKeyword-ZZZZ.cde 2
input3.txt searchKeyword-YYYYY.cde 2
input3.txt searchKeyword-ZZZZ.pqr 4
input3.txt searchKeyword-YYYYY.pqr 2
input3.txt searchKeyword-ZZZZ.abc 2
input3.txt searchKeyword-YYYYY.mno 2
但我只在一个文件中得到输出output_input3.txt
有什么建议吗? 以及我们如何进一步划分动态文件生成以进行输出,如下所示 -
==> output_input1_cde.txt <==
input1.txt searchKeyword-ZZZZ.cde 4
==> output_input1_pqr.txt <==
input1.txt searchKeyword-ZZZZ.pqr 4
==> output_input1_abc.txt <==
input1.txt searchKeyword-ZZZZ.abc 4
==> output_input2_cde.txt <==
input2.txt searchKeyword-YYYYY.cde 5
==> output_input2_pqr.txt <==
input2.txt searchKeyword-YYYYY.pqr 5
==> output_input2_mno.txt <==
input2.txt searchKeyword-YYYYY.mno 5
==> output_input3_cde.txt <==
input3.txt searchKeyword-ZZZZ.cde 2
input3.txt searchKeyword-YYYYY.cde 2
==> output_input3_pqr.txt <==
input3.txt searchKeyword-ZZZZ.pqr 4
input3.txt searchKeyword-YYYYY.pqr 2
==> output_input3_abc.txt <==
input3.txt searchKeyword-ZZZZ.abc 2
==> output_input3_mno.txt <==
input3.txt searchKeyword-YYYYY.mno 2
注意:我在 mac(awk 版本 20070501(上使用 awk 并尝试使用 ENDFILE,我认为 ENDFILE 在 mac 上的 awk 中不存在。
END
只能看到FILENAME
的最后一个实例。如果你使用的是 GNU awk,试着用ENDFILE
替换END
,看看这是否是你想要的(你可能需要delete a
,也许添加close
。使用 GNU awk(由于ENDFILE
$ cat foo.awk
#Field Seperator
BEGIN { FS="[,:"]" }
#Searching and Storing in an Array
/searchKeyword/ {a[$5]=a[$5]OFS$6}
#Looping on Array
ENDFILE { # replaced END with ENDFILE
out="output_" FILENAME # to define just once
for (k in a)
{
print FILENAME, k, gsub(OFS,OFS,a[k]) > out
}
delete a # added delete
close(out) # good habit eventho GNU awk
}
结果:
$ cat output_input1
input1 searchKeyword-ZZZZ.abc 4
input1 searchKeyword-ZZZZ.cde 4
input1 searchKeyword-ZZZZ.pqr 4
$ cat output_input2
input2 searchKeyword-YYYYY.mno 5
input2 searchKeyword-YYYYY.cde 5
input2 searchKeyword-YYYYY.pqr 5
$ cat output_input3
input3 searchKeyword-ZZZZ.abc 2
input3 searchKeyword-YYYYY.mno 2
input3 searchKeyword-ZZZZ.cde 2
input3 searchKeyword-ZZZZ.pqr 4
input3 searchKeyword-YYYYY.pqr 2
input3 searchKeyword-YYYYY.cde 2
如果你没有可用的 GNU awk 和ENDFILE
,你需要处理FNR==1
中的FILENAME
和END
块。当然,您可以(并且应该(进行function()
并从前面提到的块中调用它们,但要强调:
#Field Seperator
BEGIN { FS="[,:"]" }
FNR==1 {
if(filename!="") { # no file before the first
out="output_" filename # using previous filename
for (k in a)
{
print filename, k, gsub(OFS,OFS,a[k]) > out
}
delete a # empty env
close(out) # close used file
}
filename=FILENAME # remember filename
}
#Searching and Storing in an Array
/searchKeyword/ {a[$5]=a[$5]OFS$6}
#Looping on Array
END {
out="output_" FILENAME
for (k in a)
{
print FILENAME, k, gsub(OFS,OFS,a[k]) > out
}
delete a # good habit but more for
close(out) # symmetricity
}
更新:根据评论中的要求进行了更新。对不起,我第一次完全错过了那部分。
#Field Seperator
BEGIN { FS="[,:"]" }
FNR==1 {
if(filename!="") { # no file before the first
for (k in a)
{
n=split(k,f,".") # get the abc etc
out="output_" filename "_" f[n] ".txt" # construct the filename
print filename, k, a[k] >> out # appending to files
close(out) # spare the fds
}
delete a # empty env
}
filename=FILENAME # remember filename
}
#Searching and Storing in an Array
/searchKeyword/ {a[$5]++} # changed the counting
#Looping on Array
END {
for (k in a)
{
n=split(k,f,".") # etc
out="output_" filename "_" f[n] ".txt" # construct
print filename, k, a[k] >> out # append
close(out) # fds
}
}