Awk获得2列,并在新列中计算重复值



下面的脚本用于创建一个新的csv获取性别和状态列,并计算重复值并按状态分组,但它似乎不能正常工作,因为我得到的新csv是空的。代码

gawk -f scrt.awk ml1.csv > ml2.csv

脚本

#!/usr/bin/awk -F
BEGIN { FS=OFS="," }
FNR>1 { counts[$12 OFS $9]++ }
END   { for (i in counts) print i,counts[i] } 

csv输入

nw,d,nm,year,date,mns,arm,age,gender,rc,city,state,sg
x,x,pac,2015,2015-01-02,sur,les,53,Male,A,Shelton,WA,x
x,x,ces,2015,2015-01-02,sur,les,53,Female,A,Shelton,WA,x
x,x,ret,2015,2015-01-06,sur,ml apon,53,Male,A,Shelton,OR,x
x,x,set,2015,2015-01-02,sur,les,47,Male,W,Aloha,OR,x
x,x,wem,2015,2015-01-04,sur,ml apon,32,Male,W,San Francisco,CA,x

预期输出

state,gender,count
WA,Male,1
WA,Female,1
OR,Male,2
CA,Male,1

我执行了以下输入

BEGIN { FS=OFS="," }
FNR>1 { counts[$12 OFS $9]++ }
END   { for (i in counts) print i,counts[i] } 

使用gawk 4.2.1代码

nw,d,nm,year,date,mns,arm,age,gender,rc,city,state,sg
x,x,pac,2015,2015-01-02,sur,les,53,Male,A,Shelton,WA,x
x,x,ces,2015,2015-01-02,sur,les,53,Female,A,Shelton,WA,x
x,x,ret,2015,2015-01-06,sur,ml apon,53,Male,A,Shelton,OR,x
x,x,set,2015,2015-01-02,sur,les,47,Male,W,Aloha,OR,x
x,x,wem,2015,2015-01-04,sur,ml apon,32,Male,W,San Francisco,CA,x

得到输出

CA,Male,1
WA,Male,1
OR,Male,2
WA,Female,1

等价于期望的

WA,Male,1
WA,Female,1
OR,Male,2
CA,Male,1

时忽略行顺序。如果您希望在GNUAWK中以特定顺序遍历数组,则将PROCINFO["sorted_in"]设置为BEGIN中可用的预定义顺序之一,或者如果没有适合您的需要,则准备自己的比较函数

最新更新