查找列中重复的单词并删除

  • 本文关键字:单词 删除 查找 awk sed
  • 更新时间 :
  • 英文 :


我有一个带有数据的文件

AND (CP),(D),(SE),(SI),(CP),(D),(SE),(SI)            (Q),(Q)    1
OR  (CP),(D),(E),(SE),(SI),(CP),(D),(E),(SE),(SI)    (Q),(Q)    1
DFF (CP),(D),(E),(CP),(D),(E)                        (QN),(QN)  1

我希望输出为

AND (CP),(D),(SE),(SI)          (Q)  1
OR  (CP),(D),(E),(SE),(SI)      (Q)  1
DFF (CP),(D),(E)                (QN) 1

我想要delete the repeating terms present in column 2 and column 3eg。在第一行的第二列中,CP,D,SE,SI再次重复,所以应该删除,同样在第三列中,Q重复,所以重复的一个应该删除。

我试过使用awk

awk '!seen[$2]++' file 

但是get error cannot find [

]

您可以使用以下awk:

awk 'function dedup(col,   a, seen, i, s) {split($col, a, /,/); s=""; for (i=1; i in a; ++i) if (!seen[a[i]]++) s = s (s == "" ? "" : ",") a[i]; $col=s;} {dedup(2); dedup(3)} 1' file | column -t
AND  (CP),(D),(SE),(SI)      (Q)   1
OR   (CP),(D),(E),(SE),(SI)  (Q)   1
DFF  (CP),(D),(E)            (QN)  1

扩展形式:

awk 'function dedup(col,   a, seen, i, s) {
split($col, a, /,/)
s = ""
for (i=1; i in a; ++i)
if (!seen[a[i]]++)
s = s (s == "" ? "" : ",") a[i]
$col = s
}
{
dedup(2)
dedup(3)
} 1' file | column -t

column -t仅用于表格输出。

如果重复的部分总是完全相同并且重复了两次,则可以使用sed:

sed -E 's/ (.+),1 / 1 /g'

根据您展示的样品,请尝试以下操作。在GNUawk中编写和测试。创建了一个名为removeDup的函数,只需传递您想要在"(如"2,3")内删除重复项的所有字段号,以删除第二和第三个字段中的重复项,然后您就全部设置好了。

awk '
BEGIN{ s1="," }
function removeDup(fields){
num=split(fields,fieldNum,",")
for(k=1;k<=num;k++){
delete arr1
delete arrVal1
val1=num1=""
num1=split($fieldNum[k],arr1,",")
for(i=1;i<=num1;i++){
if(!arrVal1[arr1[i]]++){
val1=(val1?val1 s1:"")arr1[i]
}
}
$fieldNum[k]=val1
}
}
{
removeDup("2,3")
}
1
' Input_file

解释:为以上内容添加详细说明。

awk '                                   ##Starting awk program from here.
BEGIN{ s1="," }                         ##Setting s1 value to comma in BEGIN section.
function removeDup(fields){             ##Creating function removeDup passing fields to it.
num=split(fields,fieldNum,",")        ##Splitting fields into fieldNum array here.
for(k=1;k<=num;k++){                  ##Running for loop till value of num here.
delete arr1                         ##Deleting arr1 here.
delete arrVal1                      ##Deleting arrVal1 here.
val1=num1=""                        ##Nullify val1 and num1 here.
num1=split($fieldNum[k],arr1,",")   ##Splitting field(fieldNum value) into arr1 here.
for(i=1;i<=num1;i++){               ##Running for loop till value of num1 here.
if(!arrVal1[arr1[i]]++){          ##Checking condition if current arr1 values is NOT present in arrVal1 then do following.
val1=(val1?val1 s1:"")arr1[i]   ##Creating val1 here and keep on adding value to it.
}
}
$fieldNum[k]=val1                   ##Assigning currnet field value as val1 value here.
}
}
{
removeDup("2,3")                      ##Calling removeDup function in main program with 2nd and 3rd field numbers passed to it.
}
1
' Input_file                            ##mentioning Input_file name here.

最新更新