如何让"grep -zoP"单独显示每场比赛？

我在这个表单上有一个文件：

X/this is the first match/blabla
X-this is
the second match-
and here we have some fluff.

并且我想要提取在"；X〃；以及在相同的标记之间。所以如果我有"；X+匹配+"；，我想得到"；匹配"；，因为它出现在"；X〃；并且在标记"+&"；。

因此，对于给定的样本文件，我希望有这样的输出：

this is the first match

然后

this is
the second match

我设法通过使用获得了X和标记之间的所有内容

grep -zPo '(?<=X(.))(.|n)+(?=1)' file

即：

grep -Po '(?<=X(.))(.|n)+(?=1)'匹配X，然后是(something)，它被捕获并在最后与(?=1)匹配(我在这里根据我的答案编写代码(
注意，我使用(.|n)来匹配任何内容，包括新行，并且我还在grep中使用-z来匹配新行

所以这很好，唯一的问题来自输出的显示：

$ grep -zPo '(?<=X(.))(.|n)+(?=1)' file
this is the first matchthis is
the second match

正如你所看到的，所有的匹配都出现在一起；这是第一场比赛"；后面跟着"；这是第二场比赛"；根本没有分离器。我知道这源于"-z"；，它将所有文件视为一组行，每行以零字节(ASCII NUL字符(而不是换行(引用"man-grep"(结束。

那么：有没有一种方法可以分别得到所有这些结果？

我也尝试过GNU Awk:

awk 'match($0, /X(.)(n|.*)1/, a) {print a[1]}' file

但即使是CCD_ 6也不起作用。

awk不支持regexp定义中的反向引用。

解决方法：

$ grep -zPo '(?s)(?<=X(.)).+(?=1)' ip.txt | tr '' 'n'
this is the first match
this is
the second match
# with ripgrep, which supports multiline matching
$ rg -NoUP '(?s)(?<=X(.)).+(?=1)' ip.txt
this is the first match
this is
the second match

也可以使用(?s)X(.)K.+(?=1)而不是(?s)(?<=X(.)).+(?=1)。此外，您可能希望在此处使用非贪婪量词，以避免为输入X+match+xyz+foobaz+匹配match+xyz+foobaz

带perl

$ perl -0777 -nE 'say $& while(/X(.)K.+(?=1)/sg)' ip.txt
this is the first match
this is
the second match

下面是另一个使用RS和RT的gnu-awk解决方案：

awk -v RS='X.' 'ch != "" && n=index($0, ch) {
print substr($0, 1, n-1)
}
RT {
ch = substr(RT, 2, 1)
}' file

this is the first match
this is
the second match

使用GNU awk进行多字符RS、RT和gensub((，无需将整个文件读入内存：

$ awk -v RS='X.' 'NR>1{print "<" gensub(end".*","",1) ">"} {end=substr(RT,2,1)}' file
<this is the first match>
<this is
the second match>

显然，我添加了"<quot；以及">quot；这样您就可以看到每个输出记录的开始/结束位置。

以上假设X之后的字符不是非重复正则表达式元字符(例如.、^、[等(，因此YMMV

用例有点问题，因为一旦打印匹配项，就会丢失分隔符的确切位置信息。但如果这是可以接受的，请尝试管道连接到xargs -r0。

grep -zPo '(?<=X(.))(.|n)+(?=1)' file | xargs -r0

这些选项是GNU扩展，但grep -z和(主要(grep -P也是，所以这可能是可以接受的。

GNUgrep -z用空字符终止输入/输出记录(与sort -z等其他工具结合使用时很有用(。pcregrep不会这么做：

pcregrep -Mo2 '(?s)X(.)(.+?)1' file

使用-onumber而不是环视。添加了?惰性量词(以防1稍后出现(。

相关内容

最新更新

热门标签：