如何删除或替换两个图案之间的多行文字

我想在我的一些脚本中添加一些客户标志，以便在用shell脚本打包之前对其进行解析。

比方说，删除之间的所有多行文本

^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+n

以及之间

^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+n

我希望它具有容错性(大约是"_"的数量(，这就是我使用正则表达式的原因。

例如：

foo之前

i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again

将变为：

foo之后

i want this
and this
and this again

我宁愿使用sed，但任何聪明的解决方案都会受到欢迎：(

类似这样的东西：

cat before.foo |  tr 'n' 'a' | sed -r 's/([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+a.*a([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+a/a/g' | tr 'a' 'n' > after.foo

sed是处理此问题最简单的工具，因为它可以从开始模式到结束模式删除行：

sed -E '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/d' file
i want this
and this
and this again

如果您正在寻找awk解决方案，那么这里有一个更简单的awk:

awk '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/{next} 1' file

以这种方式编写awk解决方案，并使用所示的示例进行测试。

awk '
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_BEGIN/{ found=1       }
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_END/  { found=""; next}
!found
'  Input_file

对于所示的示例，输出如下。

i want this
and this
and this again

解释：简单的解释是：只要找到开始字符串(带有正则表达式(，就将标志设为TRUE(用于非打印(，只要结束字符串(带有regex检查(为Null，就将标记设为Null，开始打印(取决于行数(下一行。

您可以使用Python脚本：

import re
data = """
i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again
"""
rx = re.compile(r'^(#|//)(?:.+n)+^1.+n?', re.MULTILINE)
data = rx.sub('', data)
print(data)

这将产生

i want this
and this
and this again

请参阅regex101.com上的演示。

您可以匹配从NOT_FOR_CUSTOMER_BEGIN_到NOT_FOR_CUSTOMER_END_的尽可能少的行

注意，[//]匹配单个/而不是//

^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:n.*)*?n(?:#|//)_+NOT_FOR_CUSTOMER_END_+n*

^字符串开始
(?:#|//)匹配#或//
_+NOT_FOR_CUSTOMER_BEGIN_+在1个或多个下划线之间匹配NOT_FOR_CUSTOMER_BEGIN
(?:n.*)*?尽可能少地重复行
n(?:#|//)_+NOT_FOR_CUSTOMER_END_+匹配换行符，然后在一个或多个下划线之间匹配#或//和NOT_FOR_CUSTOMER_END_
n*删除可选的尾随换行符

Regex演示

与Python一起使用它的另一种方法：

import re
regex = r"^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:n.+)*?n(?:#|//)_+NOT_FOR_CUSTOMER_END_+n*"
s = ("i want thisn"
"#____NOT_FOR_CUSTOMER_BEGIN________n"
"not thisn"
"nor thisn"
"#________NOT_FOR_CUSTOMER_END____n"
"and thisn"
"//____NOT_FOR_CUSTOMER_BEGIN__n"
"not this againn"
"nor this againn"
"//__________NOT_FOR_CUSTOMER_END____n"
"and this again")
subst = ""
result = re.sub(regex, "", s, 0, re.MULTILINE)
if result:
print (result)

输出

i want this
and this
and this again

foo之前

foo之后

相关内容

最新更新

热门标签：