删除具有匹配项的单词，但将原始单词保留在文件中

我正在尝试实现以下目标：

有一个文件有多个单词，例如：

样本 txt 的输出为

testStr
testmystring
testmystring_1
testmystringwq
testStr_3
testStrasd
testStr-345
testStr1
testingStr1

现在我试图实现的是，如果我逐行操作文件，即第一次使用 testStr，那么所有从 testStr 开头的单词都应该被删除，但这里的 testStr 应该保留，即

预期输出为

testStr
testmystring
testmystring_1
testmystringwq
testingStr1

现在应该比较文件中的下一个字符串，即 testmystring。则预期输出为

testStr
testmystring
testingStr1

等等...

我尝试使用 sed 命令使用模式进行删除，它可以工作。但是我需要原始模式保留在文件中。

sed -i '/testStr*/d' ./sample txt

这可能对你有用(GNU sed(：

sed 's/<(testStr)S*/1/;H;$!d;x;s/.//;:a;s/<((testStrn).*)2/1/;ta' file

删除字符串 testStr 后面的所有字符。将结果和未更改的行存储在保留空间中。在文件末尾，删除引入的换行符，然后删除除字符串testStr的第一个匹配项之外的所有字符串。

注

：注：更简单的解决方案可能是：

sed 's/<(testStr)S*/1/' file | sort -u

但是，这将删除testStr以外的重复行，并且还可能更改原始顺序。

编辑：为了适应对原始问题的更改，提供了两个文件。第一个原始文件包含要测试的字符串(文件(，新的第二个文件仅包含要匹配的字符串(fileInput(。

使用上述解决方案和交替，从 fileInput 构建脚本：

sed 'H;$!d;x;s/.//;s/n/|/g;s#.*#s/\<(&)\S*/\1/;H;$!d;x;s/.//;:a;s/\<(((&)\n).*)\2/\1/;ta#' fileInput |
sed -Ef - file

这个呢？

$ grep -Evf <(sed 's/^/^/; s/$/.+/' sample.txt) sample.txt
testStr
testmystring
testingStr1

(需要 bash、zsh、ksh93 或其他理解<(command)样式重定向的 shell。

这是使用文字字符串执行您要求的操作的方法：

$ awk 'NR==FNR{tgts[$0]; next} {for (tgt in tgts) if (($0 != tgt) && (index($0,tgt) == 1)) next} 1' targets file
testStr
testmystring
testingStr1

以上是在这些输入文件上运行的：

$ tail -n +0 targets file
==> targets <==
testStr
testmystring
==> file <==
testStr
testmystring
testmystring_1
testmystringwq
testStr_3
testStrasd
testStr-345
testStr1
testingStr1

无论任一文件中的字符是什么，上述内容都有效。

相关内容

最新更新

热门标签：