如何对变量的输出进行排序



我希望能够对输入csv文件进行排序,该文件由额外列中创建的值逗号分隔。以下是输入csv文件的示例

Timestamp,Email,Name,Year,Make,Model,Car_ID,Judge_ID,Judge_Name,Racer_Turbo,Racer_Supercharged,Racer_Performance,Racer_Horsepower,Car_Overall,Engine_Modifications,Engine_Performance,Engine_Chrome,Engine_Detailing,Engine_Cleanliness,Body_Frame_Undercarriage,Body_Frame_Suspension,Body_Frame_Chrome,Body_Frame_Detailing,Body_Frame_Cleanliness,Mods_Paint,Mods_Body,Mods_Wrap,Mods_Rims,Mods_Interior,Mods_Other,Mods_ICE,Mods_Aftermarket,Mods_WIP,Mods_Overall
8/5/2018 14:10,honoland13@japanpost.jp,Hernando,2015,Acura,TLX,48,J04,Bob,0,0,2,2,4,4,0,2,4,4,2,4,2,2,2,2,2,0,4,4,4,6,2,0,4
8/5/2018 15:11,nlighterness2q@umn.edu,Noel,2015,Jeep,Wrangler,124,J02,Carl,0,6,4,2,4,6,6,4,4,4,6,6,6,6,6,4,6,6,6,6,6,4,6,4,6
8/5/2018 17:10,eguest47@microsoft.com,Edan,2015,Lexus,Is250,222,J05,Adrian,0,0,0,0,0,0,0,0,6,6,6,0,0,6,6,6,0,0,0,0,0,0,0,0,4
8/5/2018 17:34,hchilley40@fema.gov,Hieronymus,1993,Honda,Civic eG,207,J06,Aaron,0,0,2,2,2,2,2,2,0,4,2,2,2,2,2,2,4,2,2,0,0,0,2,2,0
8/5/2018 14:30,nnowick3d@tuttocitta.it,Nickolas,2016,Ford,Mystang,167,J02,Carl,0,0,2,2,0,2,2,0,0,0,0,2,0,2,2,2,0,0,2,0,0,0,0,0,2
8/5/2018 16:12,mdearl39@amazon.co.uk,Martin,2013,Hyundai,Gen coupe,159,J04,Bob,0,0,2,0,0,0,2,0,0,0,0,2,0,2,2,0,2,0,2,0,0,0,0,0,0
8/5/2018 17:00,alynamg@blogtalkradio.com,Aldridge,2009,Infiniti,G37,20,J06,Aaron,2,0,2,2,0,0,2,0,0,2,2,2,2,2,2,2,2,2,4,2,2,0,2,0

我的代码目前所做的是筛选csv文件,并选择car_id列、year列、make列和model列。然后,它遍历从racer_turbo到最后一列的每一列,对于每一行,它将这些列中的值相加为一个总值,并将其与其他值(id、make、model等(一起打印。打印时,还有一个排名列在其他5列之前。下面是我的代码。

BEGIN {
FS = ",";
OFS = "t";
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total";
}
{
rank;
total = 0;
if(NR > 1) {
for(i = 8; i < NF; i++) {
total += $i;
}   
print ++rank,$7, $4, $5, $6, total;
}
rows[$5][total][$0]
}
END {
print "n";
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total";
ranking;
PROCINFO["sorted_in"] = "@ind_str_asc"
for (m in rows) {
n = asorti(rows[m], t, "@ind_num_desc");
n = (n>3) ? 3 : n
for(i = 1; i <= n; i++) for(s in rows[m][t[i]]) {
$0 = s;
$1 = ++r;
print ++ranking, $7, $4, $5, $6, total;
}
}   
}

我想在END块中再次打印输出,但是,使用在代码的前一块中创建的total列,按照每个品牌的前三名对汽车进行排名。然而,我现在运行的代码的输出如下

Ranking Car_ID  Year    Make    Model   Total
1   48  2015    Acura   TLX 58
2   124 2015    Jeep    Wrangler    118
3   222 2015    Lexus   Is250   36
4   207 1993    Honda   Civic eG    40
5   167 2016    Ford    Mystang 18
6   159 2013    Hyundai Gen coupe   14
7   20  2009    Infiniti    G37 36
...
Ranking Car_ID  Year    Make    Model   Total
1   113 2012    Acura   Tsx sportwagon  10
2   112 2008    Acura   TL  10
3   50  2015    Acura   TLX 10
4   15  2014    Audi    S4  10
5   18  2015    Audi    S3  10
6   116 2008    Audi    A4  10
7   2   2016    Bmw M2  10
8   172 2014    Bmw 4   10
9   28  1995    Bmw 318xi   10
...

看看在第二个打印部分的合计列中,它如何显示每个打印的汽车的合计为10,而不是与第一个打印部分中每个汽车的值相同,并且显示每个品牌的最高3个合计。

以下是预期输出

Ranking Car_ID  Year    Make    Model   Total
1       48      2015    Acura   TLX     58                                                                                                                                                                                                                                                      
2       124     2015    Jeep    Wrangler        118                                                                                                                         
3       222     2015    Lexus   Is250   36                                                                                                                                  
4       207     1993    Honda   Civic eG        40                                                                                                                          
5       167     2016    Ford    Mystang 18                                                                                                                                  
6       159     2013    Hyundai Gen coupe       14                                                                                                                          
7       20      2009    Infiniti        G37     36                                                                                                                          
8       178     2009    Honda   Oddesy  66 
...
Ranking Car_ID  Year    Make    Model   Total
1       112     2008    Acura   TL      110
2       50      2015    Acura   TLX     102
3       127     2013    Acura   Tsx     86
4       15      2014    Audi    S4      120
5       18      2015    Audi    S3      38
6       116     2008    Audi    A4      28
7       2       2016    Bmw     M2      24 
8       172     2014    Bmw     4       22
9       111     2007    Bmw     328i    10
10      218     2010    Chevy   Camaro  64
11      170     2014    Chevy   Cruze   50
12      0       2015    Chevy   Camaro  0
...

用我目前的代码可以挽救吗?或者,更好的方法是创建一个单独的awk文件,对生成的输出进行排序,并生成另一个按前3名排序的文件?

我正在运行GNU AWK v4.0.2。

假设Car_ID(以下简称为id(在行之间是唯一的,请尝试:

BEGIN {
FS = ","
OFS = "t"
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total"
}
{
rank
total = 0
if (NR > 1) {
for (i = 8; i < NF; i++) {
total += $i
}   
print ++rank, $7, $4, $5, $6, total
ttl[$5][$7] = total
row[$7] = $0
}
}
END {
print "n"
print "Ranking", "Car_ID", "Year", "Make", "Model", "Total"
ranking
id
PROCINFO["sorted_in"] = "@ind_str_asc"
for (m in ttl) {
n = asorti(ttl[m], t, "@val_num_desc")
n = (n>3) ? 3 : n
for (i = 1; i <= n; i++) {
id = t[i]
total = ttl[m][id]
$0 = row[id]
print ++ranking, $7, $4, $5, $6, total
}
}
}

我稍微修改了数据结构,将id指定为主键。然后创建一个二维数组ttl,该数组包含值total由CCD_ 6和CCD_。在END循环中,我们可以检索使用CCD_ 9
附带说明一下,您的原始数据结构使用total作为索引。如果具有相同值的多行恰好具有相同值则total的任何一个索引都将被覆盖。

相关内容

  • 没有找到相关文章

最新更新