删除包含未知字符串的重复行



file.txt

test (CODE:700|SIZE:2356)
asdasdad (CODE:700|SIZE:124)
xcvxcva (CODE:700|SIZE:8974)
asdavasdasdasd (CODE:700|SIZE:124)
link-categories (CODE:700|SIZE:8974)
edit (CODE:700|SIZE:124)

我需要命令获取所有重复的SIZE:值,然后删除除一行以外的所有重复行,我的意思是输出应该是这样的:

test (CODE:700|SIZE:2356)
xcvxcva (CODE:700|SIZE:8974)
asdavasdasdasd (CODE:700|SIZE:124)

我在删除重复行中发现此命令sed '/SIZE:124/,+1 d' file.txt仅包含特定字符串

但是这个命令删除了所有的行,我需要的是删除除了一行之外的重复行+这个命令不会搜索重复的SIZE:值,所以它不起作用!

我需要的是:

  • 搜索重复的SIZE:值,如上面的124
  • 除了一行或两行(如果可以的话(之外,所有行都有这个值

也可以使用这个简单的awk来完成:

awk -F '[ |]+' '!seen[$NF]++{print}' file

test (CODE:700|SIZE:2356)
asdasdad (CODE:700|SIZE:124)
xcvxcva (CODE:700|SIZE:8974)

请您尝试以下操作。

awk 'match($0,/SIZE:[0-9]+/){val=substr($0,RSTART,RLENGTH);array[val]=$0;val=""} END{for(key in array){print array[key]}}' Input_file

或者添加一种非单向形式的溶液:

awk '
match($0,/SIZE:[0-9]+/){
val=substr($0,RSTART,RLENGTH)
array[val]=$0
val=""
}
END{
for(key in array){
print array[key]
}
}
' Input_file

解释:添加对上述代码的详细解释。

awk '                                 ##Starting awk program from here.
match($0,/SIZE:[0-9]+/){              ##Using match function to match regex of SIZE: then digits in each line here.
val=substr($0,RSTART,RLENGTH)       ##Creating variable val whose value is sub string of current line which has matched value from current line.
array[val]=$0                       ##Creating an array named array with index of variable val and value is current line.
val=""                              ##Nullify variable val here.
}
END{                                  ##Starting END block of this awk program here.
for(key in array){                  ##Traversing through array here.
print array[key]                 ##Printing array value here.
}
}
' Input_file                          ##Mentioning Input_file name here.

相关内容

最新更新