下面的脚本用于创建一个新的csv获取性别和状态列,并计算重复值并按状态分组,但它似乎不能正常工作,因为我得到的新csv是空的。代码
gawk -f scrt.awk ml1.csv > ml2.csv
脚本
#!/usr/bin/awk -F
BEGIN { FS=OFS="," }
FNR>1 { counts[$12 OFS $9]++ }
END { for (i in counts) print i,counts[i] }
csv输入
nw,d,nm,year,date,mns,arm,age,gender,rc,city,state,sg
x,x,pac,2015,2015-01-02,sur,les,53,Male,A,Shelton,WA,x
x,x,ces,2015,2015-01-02,sur,les,53,Female,A,Shelton,WA,x
x,x,ret,2015,2015-01-06,sur,ml apon,53,Male,A,Shelton,OR,x
x,x,set,2015,2015-01-02,sur,les,47,Male,W,Aloha,OR,x
x,x,wem,2015,2015-01-04,sur,ml apon,32,Male,W,San Francisco,CA,x
预期输出
state,gender,count
WA,Male,1
WA,Female,1
OR,Male,2
CA,Male,1
我执行了以下输入
BEGIN { FS=OFS="," }
FNR>1 { counts[$12 OFS $9]++ }
END { for (i in counts) print i,counts[i] }
使用gawk 4.2.1代码
nw,d,nm,year,date,mns,arm,age,gender,rc,city,state,sg
x,x,pac,2015,2015-01-02,sur,les,53,Male,A,Shelton,WA,x
x,x,ces,2015,2015-01-02,sur,les,53,Female,A,Shelton,WA,x
x,x,ret,2015,2015-01-06,sur,ml apon,53,Male,A,Shelton,OR,x
x,x,set,2015,2015-01-02,sur,les,47,Male,W,Aloha,OR,x
x,x,wem,2015,2015-01-04,sur,ml apon,32,Male,W,San Francisco,CA,x
得到输出
CA,Male,1
WA,Male,1
OR,Male,2
WA,Female,1
等价于期望的
WA,Male,1
WA,Female,1
OR,Male,2
CA,Male,1
时忽略行顺序。如果您希望在GNUAWK
中以特定顺序遍历数组,则将PROCINFO["sorted_in"]
设置为BEGIN
中可用的预定义顺序之一,或者如果没有适合您的需要,则准备自己的比较函数