计算按行分组的列的累计总和和总计百分比

  • 本文关键字:百分比 计算 awk
  • 更新时间 :
  • 英文 :


我有一个很大的值表,格式如下:

apple   1   1 
apple   2   1
apple   3   1
apple   4   1
banana  25  4
banana  35  10
banana  36  10
banana  37  10

列1有许多不同的水果,每种水果的行数不同。

我想计算第1列中每种水果的第3列的累积总和,以及每行总数的累积百分比,并将它们作为新列添加。所以想要的输出是这样的:

apple   1   1   1   25.00 
apple   2   1   2   50.00
apple   3   1   3   75.00
apple   4   1   4   100.00
banana  25  4   4   11.76   
banana  35  10  14  41.18
banana  36  10  24  70.59
banana  37  10  34  100.00

我可以用awk实现部分目标,但我正在努力解决如何在每种新水果上重置累积总和的问题。这是我可怕的awk尝试,以获得您的观看乐趣:

#!/bin/bash
awk '{cumsum += $3; $3 = cumsum} 1' fruitfile > cumsum.tmp
total=$(awk '{total=total+$3}END{print total}' fruitfile)
awk -v total=$total '{ printf ("%st%st%st%.5fn", $1, $2, $3, ($3/total)*100)}' cumsum.tmp > cumsum.txt
rm cumsum.tmp

请您尝试以下内容,并使用所示的示例进行编写和测试。

awk '
FNR==NR{
a[$1]+=$NF
next
}
{
sum[$1]+=($NF/a[$1])*100
print $0,++b[$1],sum[$1]
}
' Input_file Input_file | 
column -t

所示样本的输出如下。

apple   1   1   1  25
apple   2   1   2  50
apple   3   1   3  75
apple   4   1   4  100
banana  25  4   1  11.7647
banana  35  10  2  41.1765
banana  36  10  3  70.5882
banana  37  10  4  100

解释:添加以上详细解释。

awk '                           ##Starting awk program from here.
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
a[$1]+=$NF                    ##Creating array a with index $1 and keep adding its last field value to it.
next                          ##next will skip all further statements from here.
}
{
sum[$1]+=($NF/a[$1])*100      ##Creating sum with index 1st field and keep adding its value to it, each value will have last field/value of a[$1] and multiplying it with 100.
print $0,++b[$1],sum[$1]      ##Printing current line, array b with 1st field with increasing value of 1 and sum with index of 1st field.
}
' Input_file Input_file |       ##Mentioning Input_file name here.
column -t                       ##Sending awk output to column command for better look.

最新更新