使用 grep 在大量文本中匹配和擦除模式及其上一行

>我有一个非常大的文本文件，其中包含类似于以下内容的数据：

he/PRP have/VBD obtain/VBN the/DT ##archbishopric/NN## against/IN the/DT monk/NNS of/IN the/DT
craft/NN ,/Fc he/PRP obtain/VBD the/DT ##archbishopric/NN## of/IN besancon/NP ;/Fx and/CC have/VBD it/PRP in/IN
======>match found: ##sof/IN
succeed/VBN to/TO the/DT ##archbishopric/NN## ./Fp
klutzy/NN little/JJ ##scene/NN## where/WRB 1/Z brave/JJ french/JJ man/NN refuse/VBZ to/TO sit/VB down/RP for/IN fear/NN of/IN be/VBG discover/VBN ./Fp
======>match found: ##swhere/WRBs

我想使用 grep 来匹配和擦除所有包含"文本"行的行，紧跟在找到 =====>match 的新行字符之后：，如下所示：

craft/NN ,/Fc he/PRP obtain/VBD the/DT ##archbishopric/NN## of/IN besancon/NP ;/Fx and/CC have/VBD it/PRP in/IN
======>match found: ##sof/IN

并以换行符结尾。

因此，根据前面的示例，我想运行 grep 并获得以下输出

he/PRP have/VBD obtain/VBN the/DT ##archbishopric/NN## against/IN the/DT monk/NNS of/IN the/DT
succeed/VBN to/TO the/DT ##archbishopric/NN## ./Fp

我已经尝试过：grep -E -v '^.+n======>match found:.+$' file.txt

正如这里建议的那样，通过将正则表达式.+*n附加到命令以包含前一行，但它不起作用，有什么建议吗？

这个sed命令接近你想要的：

$ sed -n 'N;/n======>match found:/d; P;D' textfile 
he/PRP have/VBD obtain/VBN the/DT ##archbishopric/NN## against/IN the/DT monk/NNS of/IN the/DT

succeed/VBN to/TO the/DT ##archbishopric/NN## ./Fp

多行 grepping 很复杂，因为传统的 grep 实现一次只考虑一行，因此向模式中添加n是没有意义的。

如果您有可用的 pcregrep，可以使用 -M 标志进行多行匹配：

pcregrep -Mv '^.+n======>match found:.+$'

输出：

he/PRP have/VBD obtain/VBN the/DT ##archbishopric/NN## against/IN the/DT monk/NNS of/IN the/DT

succeed/VBN to/TO the/DT ##archbishopric/NN## ./Fp

相关内容

最新更新

热门标签：