file.txt
test (CODE:700|SIZE:2356)
asdasdad (CODE:700|SIZE:124)
xcvxcva (CODE:700|SIZE:8974)
asdavasdasdasd (CODE:700|SIZE:124)
link-categories (CODE:700|SIZE:8974)
edit (CODE:700|SIZE:124)
我需要命令获取所有重复的SIZE:
值,然后删除除一行以外的所有重复行,我的意思是输出应该是这样的:
test (CODE:700|SIZE:2356)
xcvxcva (CODE:700|SIZE:8974)
asdavasdasdasd (CODE:700|SIZE:124)
我在删除重复行中发现此命令sed '/SIZE:124/,+1 d' file.txt
仅包含特定字符串
但是这个命令删除了所有的行,我需要的是删除除了一行之外的重复行+这个命令不会搜索重复的SIZE:
值,所以它不起作用!
我需要的是:
- 搜索重复的
SIZE:
值,如上面的124
- 除了一行或两行(如果可以的话(之外,所有行都有这个值
也可以使用这个简单的awk
来完成:
awk -F '[ |]+' '!seen[$NF]++{print}' file
test (CODE:700|SIZE:2356)
asdasdad (CODE:700|SIZE:124)
xcvxcva (CODE:700|SIZE:8974)
请您尝试以下操作。
awk 'match($0,/SIZE:[0-9]+/){val=substr($0,RSTART,RLENGTH);array[val]=$0;val=""} END{for(key in array){print array[key]}}' Input_file
或者添加一种非单向形式的溶液:
awk '
match($0,/SIZE:[0-9]+/){
val=substr($0,RSTART,RLENGTH)
array[val]=$0
val=""
}
END{
for(key in array){
print array[key]
}
}
' Input_file
解释:添加对上述代码的详细解释。
awk ' ##Starting awk program from here.
match($0,/SIZE:[0-9]+/){ ##Using match function to match regex of SIZE: then digits in each line here.
val=substr($0,RSTART,RLENGTH) ##Creating variable val whose value is sub string of current line which has matched value from current line.
array[val]=$0 ##Creating an array named array with index of variable val and value is current line.
val="" ##Nullify variable val here.
}
END{ ##Starting END block of this awk program here.
for(key in array){ ##Traversing through array here.
print array[key] ##Printing array value here.
}
}
' Input_file ##Mentioning Input_file name here.