如何使用awk仅合并不同行的特定字段

数据文件

Istanbul;J;TK;13;OK
London;C;EN;28;OK
London;K;EN;32;OK
Paris;A;FR;30;OK
Paris;B;FR;40;OK
Zurich;G;DE;99;OK
Zurich;H;DE;33;OK
Zurich;G;DE;82;OK

预期输出：

Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H;DE;33-82-99;OK

每行的第一个字段是条件，如果该字段重复，则合并字段2和4，在字段5中只使用第一个出现的字段。

~~更新：另一个条件是必须对字段2和4进行排序并删除重复的数据，就像苏黎世的字段2一样~~

到目前为止，我的代码是，在字段2和4中，必须对数据进行排序并删除重复，就像苏黎世一样。。。

awk -F';' -v OFS=';' '{getline nx; j=split (nx, Ax); for (i=1;i<=j;i++) $i=$i Ax[i]}1' data.file

这显然没有如预期的那样起作用，这是一种可怕的回报。。。。

ParisParis;AB;FRFR;3040;OKOK
LondonLondon;CK;ENEN;2832;OKOK
IstanbulZurich;JZ;TKDE;1382;OKOK
ZurichZurich;GH;DEDE;9933;OKOK

使用GNU awk forsorted_in:

$ cat tst.awk
BEGIN { FS=OFS=";" }
$1 != prev {
if ( NR>1 ) {
prt()
}
prev = $1
delete vals
}
{
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
vals[fldNr][$fldNr]
}
}
END { prt() }
function prt(           fldNr,val,sep) {
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
PROCINFO["sorted_in"] = "@ind_" (fldNr==4 ? "num" : "str") "_asc"
sep = ""
for ( val in vals[fldNr] ) {
printf "%s%s", sep, val
sep = "-"
}
printf "%s", (fldNr<NF ? OFS : ORS)
}
}

$ awk -f tst.awk data.file
Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H;DE;33-82-99;OK

假设：

给定城市的所有行都将显示在连续的行上，因此一旦我们看到"新"城市，我们就可以继续将"旧"城市数据打印到stdout

awk的一个想法：

awk '
function printline() {
if (flds[1]) {                                    # if the previous city is non-blank then ...
for (i=1;i<=NF;i++)                            # loop through list of fields and ...
printf "%s%s", (i==1 ? "" : OFS), flds[i]  # print to stdout
print ""                                       # terminate the printf output with a linefeed
}
delete flds                                       # delete all data for the previous city
}
BEGIN         { FS=OFS=";" }
$1 != flds[1] { printline()                           # if this is a new city then print the previous city and then ...
for (i=1;i<=NF;i++)                   # capture all of the current fields
flds[i]=$i
next
}
{ for (i=2;i<NF;i=i+2)                  # if this is a repeat city then process the 2nd and 4th fields by ...
flds[i]=flds[i] "-" $i            # appending the current values to the previous value(s)
}
END           { printline() }                         # print the last city
' data.file

这将生成：

Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H-Z;DE;99-33-82;OK

awk -F';' '
BEGIN{OFS=";"}
{
a[$1][2][$2]; a[$1][3]=$3;  a[$1][4][$4]; a[$1][5]=$5;
} 
END{
for (i in a){ 
n = asorti(a[i][2], a2)
for (x=n; x>0; x--) o2 = sprintf("%s-%s", a2[x],o2) 
n = asorti(a[i][4], a4)
for (x=n; x>0; x--) o4 = sprintf("%s-%s", a4[x],o4) 
print i, substr(o2,1,length(o2)-1), a[i][3],substr(o4,1,length(o4)-1), a[i][5]
o2=o4=""
}
}' file|sort
Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H;DE;33-82-99;OK

相关内容

最新更新

热门标签：