Python反向引用正则表达式



我需要像这样搜索:

lines = """package p_dio_bfm is
   procedure setBFMCmd (  
      variable  pin : in tBFMCmd
      );
end p_dio_bfm; -- end package;
package body p_dio_bfm is
   procedure setBFMCmd (  
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
end p_dio_bfm;"""

我需要提取包名,即p_dio_bfm和包声明,即"package p_dio_bfm is"和FIRST"end p_dio_bfm;"之间的部分

问题是包声明可能以"end p_dio_bfm;"或"end package;"结束。所以我尝试了以下"OR"正则表达式:-适用于以"end package"结尾的包-对以"end pck_name;"结尾的包不起作用

pattern = re.compile("packages+(w+)s+is(.*)ends+(package|1)s*;")
match = pattern.search(lines)

问题是正则表达式的(package|1)部分,在这里我需要捕获单词"package"或匹配的包名。

更新:我已经提供了一个完整的代码,我希望能澄清它:

import re
lines1 = """package p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
end p_dio_bfm;
package body p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
end p_dio_bfm;"""
lines2 = """package p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
end package;
package body p_dio_bfm is
   procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
end package;"""
lines1 = lines1.replace('n', ' ')
print lines1
pattern = re.compile("packages+(w+)s+is(.*)ends+(package|1)s*;")
match = pattern.search(lines1)
print match
lines2 = lines2.replace('n', ' ')
print lines2
match = pattern.search(lines2)
print match

我希望在这两种情况下,使用唯一的正则表达式,返回这部分:

"""procedure setBFMCmd (
          variable  pin : in tBFMCmd
          );"""  

你的正则表达式不匹配任何东西,因为它是不正确的。如果不使用多行标志,.*将无法匹配新行字符,因此您可以使用[sS]*:

r'package ([^s]+)s+is([sS]*)ends+(package|1)s*;'

参见演示https://regex101.com/r/tZ3uH0/1

但是这里有一些其他的问题,你的字符串包含2个包块,这一点,作为一个更优雅和有效的方式,你可以使用re.DOTALL标志,使'。'特殊字符匹配任何字符,包括换行符。所以你可以这样写你的正则表达式:

pattern = re.compile("packages+(w+)s+is(.*)ends+(package|1)s*;",re.DOTALL)

但这仍然会匹配第一个块:

>>> match = pattern.search(lines)
>>> print match.group(0)
package p_dio_bfm is
   procedure setBFMCmd (  
      variable  pin : in tBFMCmd
      );
end p_dio_bfm; -- end package;
>>> print match.group(1)
p_dio_bfm
>>> print match.group(2)
   procedure setBFMCmd (  
      variable  pin : in tBFMCmd
      );
end p_dio_bfm; -- 
>>> print match.group(3)
package

要匹配所有块,您需要澄清第二组中的body等单词:

packages+(?:w+s+?)?([^s]+)s+is(.*?)ends+(package|1)s*;

参见演示https://regex101.com/r/tZ3uH0/3

如何:

>>> for row in re.findall(
...   r'package(?:s.*?)(?P<needle>[^s]+)s+iss+(.*?)ends+(?:package|(?P=needle));',
...   lines,
...   re.S
... ):
...   print '{{{', row[1], '}}}'
...
{{{ procedure setBFMCmd (
      variable  pin : in tBFMCmd
      );
}}}
{{{ procedure setBFMCmd (
      variable  pin : in tBFMCmd
      ) is
   begin
      bfm_cmd := pin;
   end setBFMCmd;
}}}

我冒昧地没有过滤@mihai-hangiu的提问方式,而是添加了第二个块。

最新更新