如何在字符串匹配和换行之间捕获字符串,同时忽略 bash 中的换行符和空格



我有一堆清单文件,我正在尝试迭代以从中提取导入包。Import-Package 以换行符分隔,后跟一个空格,用于所有连续包导入,直到 import 语句结束。 然后,它后面跟一个新行,下一个属性(在本例中为 uri)没有空格。我只需要读取导入包属性,即导入包后跟所有换行符,然后是空格模式。

示例清单导入语句如下所示

Bnd-LastModified: 1494408636933
Bundle-ManifestVersion: 2
Import-Package: com.advantco.base,com.advantco.base.logging,com.advant
co.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitu
tion,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.
oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,
com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com
.advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.meta
data,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.
auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.c
ore.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugar
crm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,co
m.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.r
est.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.c
ore.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarc
rm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.cr
ypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,ja
vax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.
transform.stream,org.apache.commons.codec.binary,org.apache.commons.c
ollections4.map,org.apache.commons.httpclient,org.apache.commons.http
client.util,org.json
Require-Capability: osgi.ee;filter:="(&(osgi.ee=JavaSE)(version=1.6))"
Tool: Bnd-3.3.0.201609221906
Export-Package: com.advantco.sugarcrm.core;uses:="com.advantco.base.lo
gging,com.advantco.sugarcrm.core.object";version="1.0.0",com.advantco
.sugarcrm.core.adapter;uses:="com.advantco.base,com.advantco.base.log
ging,com.advantco.base.net,com.advantco.base.variablesubstitution,com
.advantco.sugarcrm.core,com.advantco.sugarcrm.core.error,com.advantco
.sugarcrm.core.object,com.advantco.sugarcrm.core.object.metadata";ver
sion="1.0.0",com.advantco.sugarcrm.core.error;version="1.0.0",com.adv
antco.sugarcrm.core.iface;uses:="com.advantco.sugarcrm.core.error,com
.advantco.sugarcrm.core.object";version="1.0.0",com.advantco.sugarcrm
.core.object;uses:="com.advantco.base,com.advantco.base.mime,com.adva
ntco.base.net,com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.

Uri 或所需功能或导出包无法硬编码,可能是 Import-Package 之后的其他标志,所以我需要读取所有行,包括 Import-Package 和所有新行,后跟导入包后面的空格,直到我得到一行后跟一个新的属性字段,而不是它前面的空格(不一定是给定的标头)。

输出类似

Import-Package: com.advantco.base,com.advantco.base.logging,com.advant
co.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitu
tion,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.
oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,
com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com
.advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.meta
data,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.
auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.c
ore.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugar
crm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,co
m.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.r
est.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.c
ore.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarc
rm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.cr
ypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,ja
vax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.
transform.stream,org.apache.commons.codec.binary,org.apache.commons.c
ollections4.map,org.apache.commons.httpclient,org.apache.commons.http
client.util,org.json

然后我可以剥离新行以使其看起来像

Import-Package:com.advantco.base,com.advantco.base.logging,com.advantco.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitution,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com.advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.metadata,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.core.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugarcrm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,com.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.rest.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.core.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarcrm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.crypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,javax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.transform.stream,org.apache.commons.codec.binary,org.apache.commons.collections4.map,org.apache.commons.httpclient,org.apache.commons.httpclient.util,org.json

我正在尝试这个,但它似乎适用于导入包后面的标头处于小写情况的情况。(这里是导入包装:包名......需求能力:稍后但在某些情况下其导入包装:包名称......网址:然后被捕获。

`sed -n -e '/Import-Package/,/[A-Z]/ p'` 

但如果清单是这样的

Bnd-LastModified: 1494408636933
Bundle-ManifestVersion: 2
Import-Package: com.advantco.base,com.advantco.base.logging,com.advant
co.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitu
tion,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.
oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,
com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com
.advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.meta
data,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.
auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.c
ore.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugar
crm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,co
m.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.r
est.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.c
ore.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarc
rm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.cr
ypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,ja
vax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.
transform.stream,org.apache.commons.codec.binary,org.apache.commons.c
ollections4.map,org.apache.commons.httpclient,org.apache.commons.http
client.util,org.json
url:http://sample.org

然后 sample.org 也被俘虏了。

编辑: 由于OP告诉uri字符串不应该硬编码,所以现在添加这个解决方案。

awk '
/Import-Package/{
flag=1
val=$0
next
}
flag && /^ / && NF{
gsub(/^ /,"")
val=val?val $0:$0
next
}
flag && !/^ / && NF{
print val
flag=val=""
}'  Input_file

输出将如下所示。

Import-Package: com.advantco.base,com.advantco.base.logging,com.advantco.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitution,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth.oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter,com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com.advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.metadata,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest.auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.core.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugarcrm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,com.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.rest.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.core.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarcrm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.crypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,javax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml.transform.stream,org.apache.commons.codec.binary,org.apache.commons.collections4.map,org.apache.commons.httpclient,org.apache.commons.httpclient.util,org.json


第一个解决方案:考虑到您的实际Input_file与所示示例相同,您能否尝试以下操作。

awk '
/^uri/{
flag=""
}
/^Import/{
flag=1
}
flag{
sub(/^ +/,"")
val=val?val $0:$0
}
END{
print val
}' Input_file

第二个解决方案:在此处使用RS添加解决方案。

awk -v RS="uri:" 'FNR==1{gsub(/n|n +/,"");print}'  Input_file

第三种解决方案:在此处同时使用RSFS

awk -v RS="" -v FS="uri:" '{gsub(/n|n +/,"",$1);print $1}'  Input_file

第四个解决方案:使用带有awk的关键字再添加 1match解决方案。

awk -v RS=""  -v FS="n" 'match($0,/Import.*uri/){val=substr($0,RSTART,RLENGTH);gsub(/n|n +|uri$/,"",val);print val}' Input_file

注意:如果您只有 1 次这种类型的行要打印,那么您也可以在上述两个代码的语句之后添加exitprint

使用 Perl

perl -0777 -ne ' s/.*(Import-Package:.+?)n(?=S)(.*)/$1/smog; print ' sameer.pkg

删除换行符

perl -0777 -ne ' s/.*(Import-Package:.+?)n(?=S)(.*)/$1/smog; print ' sameer.pkg | tr -d 'n'
Import-Package: com.advantco.base,com.advantco.base.logging,com.advant co.base.mime,com.advantco.base.net,com.advantco.base.variablesubstitu tion,com.advantco.rest,com.advantco.rest.auth,com.advantco.rest.auth. oauth2,com.advantco.sugarcrm.core,com.advantco.sugarcrm.core.adapter, com.advantco.sugarcrm.core.error,com.advantco.sugarcrm.core.iface,com .advantco.sugarcrm.core.object,com.advantco.sugarcrm.core.object.meta data,com.advantco.sugarcrm.core.rest,com.advantco.sugarcrm.core.rest. auth,com.advantco.sugarcrm.core.rest.metadata,com.advantco.sugarcrm.c ore.rest.op,com.advantco.sugarcrm.core.rest.op.v10,com.advantco.sugar crm.core.rest.parser,com.advantco.sugarcrm.core.rest.parser.object,co m.advantco.sugarcrm.core.rest.parser.xml,com.advantco.sugarcrm.core.r est.service,com.advantco.sugarcrm.core.result,com.advantco.sugarcrm.c ore.result.v10,com.advantco.sugarcrm.core.service,com.advantco.sugarc rm.core.util,com.advantco.sugarcrm.core.xml,javax.activation,javax.cr ypto,javax.crypto.spec,javax.mail,javax.xml.bind,javax.xml.parsers,ja vax.xml.stream,javax.xml.transform,javax.xml.transform.dom,javax.xml. transform.stream,org.apache.commons.codec.binary,org.apache.commons.c ollections4.map,org.apache.commons.httpclient,org.apache.commons.http client.util,org.json

我的假设:

  • "导入包:"行可能从文件的中间开始。
  • 下一个属性并不总是"uri"。

那怎么样:

awk '/^Import-Package:/,!/^Import-Package:/&&!/^ / {
if (!line || sub(/^ /, "")) line = line $0}
END {print line}
' sample.txt

它从"Import-Package:"行读取,直到下一个属性的行(被丢弃),通过删除前导空格来连接这些行。

这可能对你有用(GNU sed):

sed -n '/^Import-Package:/{:a;N;s/n //;ta;P;D}' file

使用-n选项显式打印文本。从开始的第一行开始Import-Package:追加下一行。如果追加的行以空格开头,请将其删除,如果替换成功,请重复此操作,直到追加的行不匹配。然后打印图案空间的第一行,然后删除图案空间的第一行并重复。

有很多awk响应,但这在sed中也是完全可行的。

如果您只想按原样打印块:

$: sed -n '
/^Import-Package: /,/^[^ ]/ {
/^Import-Package:/ p;
/^ / p;
}
' infile

在 GNUsed中可以全部堆叠在一行上。

$: sed -n '/^Import-Package: /,/^[^ ]/ { /^Import-Package:/ p; /^ / p; }' infile

解释

$: sed -n ' ...     ' infile

sed-n一起使用可防止任何输出,除非通过显式命令;从(在此示例中)名为infile的文件读取,根据需要进行调整。在单引号内,程序显示:

/^Import-Package: /,/^[^ ]/ {
/^Import-Package:/ p; 
/^ / p;
}

从任何以Import-Package:开头的行开始,并继续到以任何非空格开头的任何后续行(此处明确为空格字符),执行从此左大括号到匹配的右大括号的所有命令。

在该块中,对于任何以Import-Package:开头的行,请打印它。对于任何以空格开头的行,请打印它。

没有命令可以在任何以非空格开头的行上打印Import-Package:,因此,如果在其下方启动了另一个块,它将不会打印该块,并且切换将超出范围,因此除非另一个Import-Package:块开始,否则它不会打印任何其他内容。

如果块结束文件,则代码范围永远不会超出范围,因此它将打印,直到记录用完为止。

如果您希望它将块全部打印在一行上并删除空格 -

$: sed -n '
/^Import-Package: /,/^[^ ]/ {
/^Import-Package:/ { h; d; }
/^ / H;
/^[^ ]/ { s/.*//; x; s/n* //g; p; d;   }
$       {         x; s/n* //g; p; d;   }
}
' infile

对于从/^Import-Package: /到任何非空格第一字符的行,

  • 如果该行以Import-Package:开头,请用它替换保留空间,并将其从模式空间中删除以触发干净的下一次读取。
  • 如果行以空格开头,请将其添加到保留空间
  • 如果一行以非空格开头,则用s/.*//擦洗它;其余的也适用于最后一行($),因此在任何一种情况下,x将累积的保持空间放回模式空间(从技术上讲,它会交换它们),s/n* //g替换所有换行空格序列(删除它们),p打印该行, 并d删除它以获得干净的缓冲区以开始下一个循环(在文件末尾退出)。

其余的是一个不必要的选择,

。但是由于我第一次误读了请求,所以我留下了它,以防它可能会帮助其他人。

如果您希望将所有包裹分解并打印在自己的一行上(这就是我最初认为您的意思),那么

$: sed -n '
$ {
/^Import-Package: / {
s/^Import-Package: //; s/,/n/g; p;
}
}
/^Import-Package: /,/^[^ ]/ {
/^Import-Package:/ { s/^Import-Package://; h; n; }
/^ / H;
/^[^ ]/ { s/.*//; x; s/n* //g; s/,/n/g; p; d;   }
$       { s/.*//; x; s/n* //g; s/,/n/g; p; d;   }
}
' infile

如果无法从文件的最后一行开始Import-Package:,则可以删除顶部的$块。如果它不能是文件中的最后一个块,您也可以删除主块底部的$行。

c.f. GNU sed 手册,用于每个命令的分解 - 如果您愿意,我会回来在这里详细说明。

最新更新