AWK文件转换



我有一个以下格式的文件:

Total:89.3    
User: user1
Count:3
Sum:80
departmentId: dept1
Amount by departmentId: 20
departmentId: dept1
Amount by departmentId: 35
departmentId: dept2
Amount by departmentId: 25
User: user2
Count:3
Sum:7.199999999999999
departmentId: dept1
Amount by departmentId: 2.4
departmentId: dept2
Amount by departmentId: 2.4
departmentId: dept3
Amount by departmentId: 2.4
User: user3
Count:1
Sum:0.2
departmentId: dept2
Amount by departmentId: 0.2
User: user4
Count:2
Sum:2
departmentId: dept3
Amount by departmentId: 1
departmentId: dept3
Amount by departmentId: 1

文件基本上列出了一个部门的用户会费。如果同一个用户多次属于一个部门,则需要将其合并为一行。输出文件需要采用以下格式。 对于用户 1,他有 2 个部门 1 的会费和 1 个部门 2 的会费。因此,在输出文件中,需要将 dept1 的 2 个会费合并为 1 个计数需要为 no。每个部门的唯一用户。

Format:
count_of_uique_user_dept_rows total_sum   -- note** header row-->total sum and total no. of unique user dues
userId+deptId sum for that dept
Example:
7 89.3
user1dept1 55
user1dept2 25
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
user3dept2 0.2
user4dept3 2

到目前为止,我所拥有的,

# This awk script is used to convert the input of library credit/debit's to the required Student Accounts Load format
BEGIN { FS=": *" }
{
gsub(/^ +| +$/,"")
f[$1] = $2
}
/Amount/ {
dept = f["departmentId"]
total = f["Total"]
sum[dept] += $2
amount += $2
}
$1 == "User" {
if (NR>1) {
format()
}
user = $2
}
END { format() }
function format() {
if ( length(sum) > 0 ) {
for (dept in sum) {
printf "%-9s%-12s%10.2fn", substr(user,1,9), substr(dept,1,12), sum[dept]
}
delete sum
amount = 0
}
}

上面的脚本为我们提供了数据行。我无法弄清楚如何获取 7 89.3 的标题行请帮忙。

我决定不读取文件两次,而只是在打印前将输出保存在数组中。具体操作方法如下:

第 1 步:修复当一些 awks 假设sum是标量时您会从一些 awks 那里得到的语法错误,因为您在 BEGIN 部分添加delete sum数组操作之前调用了length(sum)(您可以删除对长度(sum(的测试,因为它在您的代码中没有任何用处,但我想解释这个问题以及如何解决它一般(。

BEGIN { FS=": *"; delete sum }

第 2 步:更改format()函数以加载稍后要输出的值数组,而不是立即输出这些值:

function format() {
if ( length(sum) > 0 ) {
for (dept in sum) {
vals[++numVals] = sprintf("%-9s%-12s%10.2f", substr(user,1,9), substr(dept,1,12), sum[dept])
}
delete sum
amount = 0
}

}

第 3 步:在 END 部分添加一个循环以实际进行打印:

END {
format()
for (valNr=1; valNr<=numVals; valNr++) {
print vals[valNr]
}
}

此时,您获得的输出将与现有脚本完全相同,但它设置我们添加您需要的新功能:

第 4 步:将每个用户 + 部门组合保存为数组的索引usrdpt[]

/Amount/ {
dept = f["departmentId"]
total = f["Total"]
sum[dept] += $2
usrdpt[user,dept]
amount += $2
}

步骤5:在打印以前的值之前,在END部分中打印新usrdpt[]数组的唯一索引数:

END {
format()
print length(usrdpt)
for (valNr=1; valNr<=numVals; valNr++) {
print vals[valNr]
}
}

结果是:

$ cat tst.awk
BEGIN { FS=": *"; delete sum }
{
gsub(/^ +| +$/,"")
f[$1] = $2
}
/Amount/ {
dept = f["departmentId"]
total = f["Total"]
sum[dept] += $2
usrdpt[user,dept]
amount += $2
}
$1 == "User" {
if (NR>1) {
format()
}
user = $2
}
END {
format()
print length(usrdpt)
for (valNr=1; valNr<=numVals; valNr++) {
print vals[valNr]
}
}
function format() {
if ( length(sum) > 0 ) {
for (dept in sum) {
vals[++numVals] = sprintf("%-9s%-12s%10.2f", substr(user,1,9), substr(dept,1,12), sum[dept])
}
delete sum
amount = 0
}
}

.

$ awk -f tst.awk file
7
user1    dept1            55.00
user1    dept2            25.00
user2    dept1             2.40
user2    dept2             2.40
user2    dept3             2.40
user3    dept2             0.20
user4    dept3             2.00

我假设您可以弄清楚如何保存并稍后打印Total值。

使用 GNU awk 和 2d 数组:

$ awk '
$1=="User:" {                                 # store user
u=$NF 
}
$1=="departmentId:" {                         # store dept
d=$NF
}
$1=="Amount" {
if(a[u][d]=="")                           # count uniq user/depts
c++
s+=$NF                                    # total sum
a[u][d]+=$NF                              # user/dept sum
}
END {
printf "%s, %.2fn",c,s                   # output count and total
for(u in a)
for(d in a[u]) 
printf "%s %s %.2fn",u,d,a[u][d] # output user/dept sums
}' file

输出:

7 89.40
user1 dept1 55.00
user1 dept2 25.00
user2 dept1 2.40
user2 dept2 2.40
user2 dept3 2.40
user3 dept2 0.20
user4 dept3 2.00

最新更新