更新:另一个条件是必须对字段2和4进行排序并删除重复的数据,就像苏黎世的字段2一样
数据文件
Istanbul;J;TK;13;OK
London;C;EN;28;OK
London;K;EN;32;OK
Paris;A;FR;30;OK
Paris;B;FR;40;OK
Zurich;G;DE;99;OK
Zurich;H;DE;33;OK
Zurich;G;DE;82;OK
预期输出:
Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H;DE;33-82-99;OK
每行的第一个字段是条件,如果该字段重复,则合并字段2和4,在字段5中只使用第一个出现的字段。
到目前为止,我的代码是,在字段2和4中,必须对数据进行排序并删除重复,就像苏黎世一样。。。
awk -F';' -v OFS=';' '{getline nx; j=split (nx, Ax); for (i=1;i<=j;i++) $i=$i Ax[i]}1' data.file
这显然没有如预期的那样起作用,这是一种可怕的回报。。。。
ParisParis;AB;FRFR;3040;OKOK
LondonLondon;CK;ENEN;2832;OKOK
IstanbulZurich;JZ;TKDE;1382;OKOK
ZurichZurich;GH;DEDE;9933;OKOK
使用GNU awk forsorted_in
:
$ cat tst.awk
BEGIN { FS=OFS=";" }
$1 != prev {
if ( NR>1 ) {
prt()
}
prev = $1
delete vals
}
{
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
vals[fldNr][$fldNr]
}
}
END { prt() }
function prt( fldNr,val,sep) {
for ( fldNr=1; fldNr<=NF; fldNr++ ) {
PROCINFO["sorted_in"] = "@ind_" (fldNr==4 ? "num" : "str") "_asc"
sep = ""
for ( val in vals[fldNr] ) {
printf "%s%s", sep, val
sep = "-"
}
printf "%s", (fldNr<NF ? OFS : ORS)
}
}
$ awk -f tst.awk data.file
Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H;DE;33-82-99;OK
假设:
- 给定城市的所有行都将显示在连续的行上,因此一旦我们看到"新"城市,我们就可以继续将"旧"城市数据打印到stdout
awk
的一个想法:
awk '
function printline() {
if (flds[1]) { # if the previous city is non-blank then ...
for (i=1;i<=NF;i++) # loop through list of fields and ...
printf "%s%s", (i==1 ? "" : OFS), flds[i] # print to stdout
print "" # terminate the printf output with a linefeed
}
delete flds # delete all data for the previous city
}
BEGIN { FS=OFS=";" }
$1 != flds[1] { printline() # if this is a new city then print the previous city and then ...
for (i=1;i<=NF;i++) # capture all of the current fields
flds[i]=$i
next
}
{ for (i=2;i<NF;i=i+2) # if this is a repeat city then process the 2nd and 4th fields by ...
flds[i]=flds[i] "-" $i # appending the current values to the previous value(s)
}
END { printline() } # print the last city
' data.file
这将生成:
Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H-Z;DE;99-33-82;OK
awk -F';' '
BEGIN{OFS=";"}
{
a[$1][2][$2]; a[$1][3]=$3; a[$1][4][$4]; a[$1][5]=$5;
}
END{
for (i in a){
n = asorti(a[i][2], a2)
for (x=n; x>0; x--) o2 = sprintf("%s-%s", a2[x],o2)
n = asorti(a[i][4], a4)
for (x=n; x>0; x--) o4 = sprintf("%s-%s", a4[x],o4)
print i, substr(o2,1,length(o2)-1), a[i][3],substr(o4,1,length(o4)-1), a[i][5]
o2=o4=""
}
}' file|sort
Istanbul;J;TK;13;OK
London;C-K;EN;28-32;OK
Paris;A-B;FR;30-40;OK
Zurich;G-H;DE;33-82-99;OK