python在同一行搜索不同的字符串



我有以下代码要优化:

if re.search(str(stringA), line) and re.search(str(stringB), line):
.....
.....

我试过了:

stringAB = stringA + '.*' + stringB
if re.search(str(stringAB), line):
.....
.....

但我得到的结果并不可靠。我在这里使用"re.search",因为这似乎是我搜索字符串A和字符串B中指定的模式的确切正则表达式的唯一方法。

此代码背后的逻辑是根据以下egrp命令示例建模的:

stringA=Success
stringB=mysqlDB01
egrep "${stringA}" /var/app/mydata | egrep "${stringB}"

如果有更好的方法不用重新搜索,请告诉我。

实现这一点的一种方法是创建一个匹配任一单词的模式(使用b,因此我们只匹配完整的单词(,使用re.findall检查字符串中的所有匹配项,然后使用集合相等来确保两个单词都匹配。

import re
stringA = "spam"
stringB = "egg"
words = {stringA, stringB}
# Make a pattern that matches either word
pat = re.compile(r"b{}b|b{}b".format(stringA, stringB))
data = [
"this string has spam in it",
"this string has egg in it",
"this string has egg in it and another egg too",
"this string has both egg and spam in it",
"the word spams shouldn't match",
"and eggs shouldn't match, either",
]
for s in data:
found = pat.findall(s)
print(repr(s), found, set(found) == words)   

输出

'this string has spam in it' ['spam'] False
'this string has egg in it' ['egg'] False
'this string has egg in it and another egg too' ['egg', 'egg'] False
'this string has both egg and spam in it' ['egg', 'spam'] True
"the word spams shouldn't match" [] False
"and eggs shouldn't match, either" [] False

执行set(found) == words的一种稍微更有效的方法是使用words.issubset(found),因为它跳过了found的显式转换。


正如Jon Clements在评论中提到的,我们可以简化和概括模式来处理任何数量的单词,我们应该使用re.escape,以防任何单词包含regex元字符。

pat = re.compile(r"b({})b".format("|".join(re.escape(word) for word in words)))

谢谢,乔恩!


这是一个按照指定顺序匹配单词的版本。如果找到匹配项,则打印匹配的子字符串,否则打印None。

import re
stringA = "spam"
stringB = "egg"
words = [stringA, stringB]
# Make a pattern that matches all the words, in order
pat = r"b.*?b".join([re.escape(word) for word in words])
pat = re.compile(r"b" + pat + r"b")
data = [
"this string has spam and also egg, in the proper order",
"this string has spam in it",
"this string has spamegg in it",
"this string has egg in it",
"this string has egg in it and another egg too",
"this string has both egg and spam in it",
"the word spams shouldn't match",
"and eggs shouldn't match, either",
]
for s in data:
found = pat.search(s)
if found:
found = found.group()
print('{!r}: {!r}'.format(s, found))

输出

'this string has spam and also egg, in the proper order': 'spam and also egg'
'this string has spam in it': None
'this string has spamegg in it': None
'this string has egg in it': None
'this string has egg in it and another egg too': None
'this string has both egg and spam in it': None
"the word spams shouldn't match": None
"and eggs shouldn't match, either": None

最新更新