假设我在testFile文件中有以下几行:
Test Line in File
Test Line in File
Test Line in File
Test Line in File Line
Test Line in File Line
是否有可能做一个re.findall(),这将允许我找到一个"模式"的实例每行?例如,如果我执行len(re.findall("Line", testfile, 0)),程序将返回7。我想让它返回5。我在想"Line "之类的。*n",但这仍然会返回7。澄清一下,我想避免使用:
count = 0
with open(testFile, "r") as file:
for line in file:
re.match(pattern, testFile, 0)
#etc
对于这样一个简单的匹配,使用这个更有效…
count = 0
with open(testFile, "r") as file:
for line in file:
if 'Line' in line:
count += 1
…它使用了一个高度优化的搜索算法,比使用正则表达式快得多(大约快8倍)。
你可以使用Multiline标志!
>>> s = """Test Line in File
... Test Line in File
... Test Line in File
... Test Line in File Line
... Test Line in File Line"""
>>> r = re.compile("^.*Line.*$", flags=re.MULTILINE)
>>> r.findall(s)
['Test Line in File',
'Test Line in File',
'Test Line in File',
'Test Line in File Line',
'Test Line in File Line']
然而,在这种情况下,我不鼓励使用正则表达式!
将整个文件加载到内存中进行re.findall
是没有意义的,这样做会失去在找到第一个匹配时短路的能力
import re
with open('data.txt') as f:
print sum(1 if re.search(r"Line", line) else 0 for line in f)
5