正则表达式只替换包装在特定函数中的表达式



我希望创建一个python函数,这将需要一个很长的SQL脚本,我必须创建一个表,并将会话变量放入脚本中,以便它可以用作雪花内的视图。例如,

SET TABLE_NAME = MY_TABLE_NAME;
CREATE OR REPLACE VIEW MY_VIEW AS (
SELECT * FROM IDENTIFIER($TABLE_NAME)
)

使用python脚本,前一个块变成

CREATE OR REPLACE VIEW MY_VIEW AS(
SELECT * FROM MY_TABLE
)

但是,在测试期间,我意识到如果变量名在另一个函数中,那么最后一个括号将被捕获并删除。是否有一种方法,我可以用变量值替换字符串,只有当它被包装在标识符函数?我想要这样的代码:

IDENTIFIER($VAR_NAME)
identifier($VAR_NAME)
SELECT * FROM $VAR_NAME
DATEADD('DAY',+1,$VAR_NAME)

成为:

VAR_NAME
VAR_NAME
SELECT * FROM VAR_NAME
DATEADD('DAY',+1,VAR_NAME)

这是我到目前为止所尝试的。https://regex101.com/r/2SriK9/2谢谢。

注:在最后一个例子中,如果var_name是一个函数,它需要有函数,然后用右括号结束:DATEADD('DAY',+1,MY_FUNC())[目前,我的输出是DATEADD('DAY',+1,MY_FUNC()], DATEADD函数没有右括号。

不使用正则表达式执行复杂的替换,您可以使用诸如sqlglot之类的库将查询解析为AST,然后您可以更新AST以生成所需的查询:

import sqlglot
def run_updates(ast, scope):
if isinstance(ast, list):
#remove any `SET` queries:
return [run_updates(i, scope) for i in ast if 
not isinstance(i, sqlglot.expressions.Command) or str(i.this).lower() != 'set']
if isinstance(ast, sqlglot.expressions.Create):
if isinstance(ast.this, (sqlglot.expressions.Func, sqlglot.expressions.UserDefinedFunction)):
#node is a function, save the function name
scope['functions'].append(ast.this.this.this)
if isinstance(ast, sqlglot.expressions.Anonymous):
if ast.this.lower() == 'identifier':
#remove `IDENTIFER($VALUE)` expressions
return sqlglot.expressions.Identifier(this = ast.args['expressions'][0].this.this[1:], quoted=False)
if isinstance(ast, sqlglot.expressions.Identifier):
if isinstance(ast.this, str) and ast.this[0] == '$':
#replace general `$session_variable` expressions
if ast.this[1:] in scope['functions']:
#if the session variable is function, update accordingly
return sqlglot.expressions.Anonymous(this = ast.this[1:])
return sqlglot.expressions.Identifier(this = ast.this[1:], quoted=False)
if hasattr(ast, 'args'):
ast.args = {a:run_updates(b, scope) for a, b in ast.args.items()}
return ast
def replace_identifiers(query):
ast, scope = sqlglot.parse(query), {'variables':{}, 'functions':[]}
return ';n'.join(i.sql() for i in run_updates(ast, scope))

测试:

#replacing IDENFITIER
s = '''
SET TABLE_NAME = MY_TABLE_NAME;
CREATE OR REPLACE VIEW MY_VIEW AS (
SELECT * FROM IDENTIFIER($TABLE_NAME)
)
'''
print(replace_identifiers(s))
#replacing IDENTIFIER and session variables
s1 = '''
IDENTIFIER($VAR_NAME);
identifier($VAR_NAME);
SELECT * FROM $VAR_NAME;
DATEADD('DAY',1,$VAR_NAME);
'''
print(replace_identifiers(s1))
#replacing session variable as a function call, since the test name is declared as a function in a prior line
s2 = '''
CREATE FUNCTION VAR_NAME() RETURNS TIMESTAMP AS 'SELECT now()';
DATEADD('DAY',1,$VAR_NAME);
'''
print(replace_identifiers(s2))

输出:

CREATE OR REPLACE VIEW MY_VIEW AS (SELECT * FROM TABLE_NAME)
VAR_NAME;
VAR_NAME;
SELECT * FROM VAR_NAME;
DATEADD('DAY', 1, VAR_NAME)
CREATE FUNCTION VAR_NAME() RETURNS TIMESTAMP AS 'SELECT now()';
DATEADD('DAY', 1, VAR_NAME())

您在这里寻找两种模式:一种包含在标识符函数中,另一种仅包含前面的$字符,因此您可以使用替代模式来搜索它们,捕获每个变量的变量名(如果有的话),并将匹配替换为捕获的内容。

查找(带不区分大小写标志):

identifier($(w+))|$(w+)

替换为:

12

演示:https://regex101.com/r/2SriK9/3

最新更新