我想捕获整个子查询,无论两者之间是否有concat或子字符串函数(即忽略子查询内的另一个括号打开和关闭)。(a)我们不想捕获"加入";作为一个词(b)"别名";不会总是跟在"join"后面。它可以是任何东西(字边界、空格或"join")。词)。
Case 1: select
中没有concat或子字符串函数:
(select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2 join
:
(select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2
案例2:select
中的Concat函数:
(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2 join
:
(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2
我试过了:
方法一:re.findall('(select.*?)s[a-zA-Z0-9_]+', input statement)
方法2:由@TheFourthBird建议
import re
pat1 = '(select.*?)s[a-zA-Z0-9_]+'
pat2 = "(select [^()]*(?:(((?>[^()]+|(?1))*)))?[^()]*)[^()n]+"
string1 = "(select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2"
string2 = "(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2"
print(re.findall(pat1, string1))
print(re.findall(pat1, string2))
import regex as re
print(re.findall(pat2, string1))
print(re.findall(pat2, string2))
pattern = re.compile(pat2, re.UNICODE)
print([match.group(0) for match in pattern.finditer(string2)])
输出:
["(select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2"]
['(select concat(t1.col1, t2.col1, t3.col1) as']
['']
['(t1.col1, t2.col1, t3.col1)']
["(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2 join "]
上面的方法有什么问题:
方法1:适用于情况1,但不适用于情况2。
方法2:仍然不工作!然而,
["(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2 join "]
是最接近预期的。但是,它不应该捕获alias2旁边的内容。
请帮帮我!
您可以首先匹配(select
,并可选择使用PyPi regex模块匹配平衡括号。
在模式的末尾,匹配一个空白字符并使用您的字符类。
(select [^()]*(?:(((?>[^()]+|(?1))*)))?[^()]*)s[a-zA-Z0-9_]+
在部分中,模式匹配:
(select
匹配(select
[^()]*
可选择匹配除(
和)
以外的任何字符(?:
非捕获组(
Capture组1((?>[^()]+|(?1))*)
匹配(
并使用递归模式递归第一个子组(捕获组1),最后匹配)
)
关闭组1
)?
关闭非捕获组并使其为可选[^()]*)
可选地匹配除括号外的任何字符,然后匹配)
s[a-zA-Z0-9_]+
匹配字符类 中列出的一个空白字符和1+
查看正则表达式演示和Python演示
例如,使用re.finditer(因为re.findall返回捕获组值):
import regex as re
pattern = r"(select [^()]*(?:(((?>[^()]+|(?1))*)))?[^()]*)s[a-zA-Z0-9_]+"
s = ("In: (select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2nn"
"Out: (select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2nn"
"In: (select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2nn"
"Out: (select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2n")
matches = re.finditer(pattern, s)
for matchNum, match in enumerate(matches, start=1):
print(match.group())
输出["(select t1.col1 as alias1 from db.tb where t1.col1='val1') alias2"]
["(select concat(t1.col1, t2.col1, t3.col1) as alias1 from db.tb where t1.col1='val1') alias2"]
注意匹配SQL是容易出错的