如何将awk结果变量分配给数组,以及是否可以在循环中的另一个awk中使用awk



我已经开始学习bash,并完全坚持了这项任务。我有一个逗号分隔的csv文件,记录如下:

id,location_id,organization_id,service_id,name,title,email,department
1,1,,,Name surname,department1 department2 department3,,
2,1,,,name Surname,department1,,
3,2,,,Name Surname,"department1 department2, department3",, e.t.c.

我需要这样格式化:名称姓氏必须以大写字母开头

  • 添加一个电子邮件记录,该记录由姓名的第一个字母和小写的完整姓氏组成
  • 使用已更正字段的旧csv中的记录创建一个新csv

我使用awk在记录上拆分csv(因为有些字段包含引号之间带逗号的字段"department1department2,department3"。

#!/bin/bash
input="$HOME/test.csv"
exec 0<$input
while read line; do
awk -v FPAT='"[^"]*"|[^,]*' '{ 
...
}' $input)
done

在awk{…}(每个记录的NF=8(中,我尝试使用某些字段值($1$2$3$4$5$6$7$8(:

#it doesn't work 
IFS=' ' read -a name_surname<<<$5 # Field 5 match to *name* in heading of csv
# Could I use inner awk with field values of outer awk ($5) to separate the field value of outer awk $5 ? 
# as an example:                                  
# $5="${awk '{${1^}${2^}}' $5}"
# where ${1^} and ${2^} fields of inner awk

name_surname[0]=${name_surname[0]^}
name_surname[1]=${name_surname[1]^}

$5="${name_surname[0]}' '${name_surname[1]}"
email_name=${name_surname[0]:0:1}
email_surname=${name_surname[1]}
domain='@domain'
$7="${email_name,}${email_surname,,}$domain" # match to field 7 *email* in heading of csv

如何将字段值($1$2$3$4$5$6$7$8(添加到数组中,并为循环的每个调用函数join,以将记录添加到新的csv文件中?

function join { local IFS="$1"; shift; echo "$*"; }
result=$(join , ${arr[@]})
echo $result >> new.csv  

这可能是你想要做的(像你已经在做的那样对FPAT使用gawk(,但如果没有更具代表性的样本输入和预期输出,这只是猜测:

$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN {
OFS = ","
FPAT = "[^"OFS"]*|"[^"]*""
}
NR > 1 {
n = split($5,name,/s*/)
$7 = tolower(substr(name[1],1,1) name[n]) "@example.com"
print
}
' "${@:--}"

$ ./tst.sh test.csv
1,1,,,Name surname,department1 department2 department3,nsurname@example.com,
2,1,,,name Surname,department1,nsurname@example.com,
3,2,,,Name Surname,"department1 department2, department3",nsurname@example.com,

我把awk脚本放在一个shell脚本中,因为这看起来像你想要的,显然你不需要这样做——你可以把awk剧本保存在一个文件中,然后用awk -f调用它。

Ed Morton的完整答案。

如果这对某人有帮助,我添加了一个检查条件:如果在CSV文件中有多个具有相同名称的电子邮件地址-索引号被添加到电子邮件本地部分,输出被发送到文件

#!/usr/bin/env bash
input="$HOME/test.csv"
exec 0<$input
awk '
BEGIN {
OFS = ","
FPAT = "[^"OFS"]*|"[^"]*""
}
(NR == 1) {print} #header of csv
(NR > 1) {
if (length($0) > 1) { #exclude empty lines
count = 0
n = split($5,name,/s*/)
email_local_part = tolower(substr(name[1],1,1) name[n])

#array stores emails from csv file
a[i++] = email_local_part

#find amount of occurrences of the same email address
for (el in a) {
ret=match(a[el], email_local_part)

if (ret == 1) { count++ }
} 
#add number of occurrence to email address
if (count == 1) { $7 = email_local_part "@abc.com" }
else { --count; $7 = email_local_part count "@abc.com" }
print 
}
} 
' "${@:--}" > new.csv

最新更新