元组索引超出范围-但仅当在函数中运行时



我正在尝试使用SQLParse编写一个进程,该进程列出了SQL语句中存在的表,目前主要关注查询的'FROM'子句。我还试图识别FROM子句中的嵌套查询(或子查询(,并再次运行该过程来识别该嵌套查询中的表。

使用此示例查询

from sqlparse.sql import IdentifierList, Identifier, Function, Where, Parenthesis, TokenList
from sqlparse.tokens import Keyword, DML, Punctuation
sql_2 = """select * from luv_main.test_table left join (select * from luv_all.fake_Table where (a = b)) x  where a = 4 order by A, B, C"""

以下是正在工作的代码:

full_tables = []
tables = []
from_seen = False
for item in parsed.tokens:

#stop the process if the Where statement is reached
if isinstance(item, Where):
from_seen = False

if from_seen:

#multiple tables with Join statements in between, or one table. Doesn't consider subqueries
if isinstance(item, Identifier):

#checks to see if there is a parenthesis, meaning a subquery 
if 'SELECT' in item.value.upper():
subquery = item.value

#returns the db name 
tables.append(item.get_parent_name())

#returns the table name
tables.append(item.get_real_name())
#returns the alias
tables.append(item.get_alias())

full_tables.append(tables)
tables = []


# if multiple tables separated by comma's will be an identifier list. Doesn't consider subqueries
if isinstance(item, IdentifierList):
for identifier in item.get_identifiers():
#returns the db name
tables.append(identifier.get_parent_name())

#returns the table name
tables.append(identifier.get_real_name())

#returns the alias
tables.append(identifier.get_alias())

full_tables.append(tables)
tables = []

else:
if item.ttype is Keyword and item.value.upper() == 'FROM':
from_seen = True

print(full_tables)
print(len(full_tables))

这从查询开始,还通过搜索单词select来识别子查询,然后我就有了这个。

#process of removing outer-most parentheses and identifying aliases that sit outside that window
#new subquery string ready to parse
res_sub = ""
#capture the alias
alias = ""
#record the number of parentheses as they open and close
paren_cnt = 0

for char in subquery:

#if ( and there's already been a ( , include it
if char == '(' and paren_cnt > 0:
res_sub += char

#if (, add to the count
if char == '(':
paren_cnt += 1

# if ) and there's at least 2 (, include it
if char == ')' and paren_cnt > 1:
res_sub += char

# if ), subtract from the count        
if char == ')':
paren_cnt -= 1

# capture the script
if char != '(' and char != ')' and paren_cnt >0:
res_sub += char

# capture the alias
if char != '(' and char != ')'  and char != ' ' and paren_cnt == 0:
alias += char

subparsed = sqlparse.parse(res_sub)[0]

然后,这将删除最外面的括号,并解析为新的SQL语句。这一切都在起作用,如果我手动通过前一块代码运行这个解析后的语句,它就会按预期工作。

然后我尝试将其放入单独的功能中:

  • 首先解析查询并调用:
  • 一个函数,它扫描FROM子句并返回表,但如果它标识了一个子查询,则调用:
  • 一个函数,它删除脚本中最外层的括号,然后调用第一个函数,在整个过程中将其发回

但是当它尝试运行sqlparse.parse(res_sub)[0]时,元组索引超出范围。它不应该是元组,应该是str,然后将其解析为sqlparse.sql.Statement.

我不明白为什么仅仅因为我把它放进了一系列函数中,它的行为就不同了。以下功能代码:

def parse(sql):

parsed = sqlparse.parse(sql)[0]

#call function to assess the FROM statement of the query
assess_from_clause(parsed)
def assess_from_clause(parsed):

full_tables = []
tables = []

from_seen = False
for item in parsed.tokens:
#stop the process if the Where statement is reached
if isinstance(item, Where):
from_seen = False

#checks to see if there is a parenthesis, meaning a subquery 
if 'SELECT' in item.value.upper():
subquery = item.value
subquery_parsing(subquery)

if from_seen:
#multiple tables with Join statements in between, or one table. Doesn't consider subqueries
if isinstance(item, Identifier):

#returns the db name 
tables.append(item.get_parent_name())

#returns the table name
tables.append(item.get_real_name())
#returns the alias
tables.append(item.get_alias())

full_tables.append(tables)
tables = []


# if multiple tables separated by comma's will be an identifier list. Doesn't consider subqueries
if isinstance(item, IdentifierList):
for identifier in item.get_identifiers():
#returns the db name
tables.append(identifier.get_parent_name())

#returns the table name
tables.append(identifier.get_real_name())

#returns the alias
tables.append(identifier.get_alias())

full_tables.append(tables)
tables = []

else:
if item.ttype is Keyword and item.value.upper() == 'FROM':
from_seen = True

print(full_tables)
def subquery_parsing(subquery):

#new subquery string ready to parse
res_sub = ''
#capture the alias
alias = ''
#record the number of parentheses as they open and close
paren_cnt = 0

for char in subquery:
#if ( and there's already been a ( , include it
if char == '(' and paren_cnt > 0:
res_sub += char

#if (, add to the count
if char == '(':
paren_cnt += 1

# if ) and there's at least 2 (, include it
if char == ')' and paren_cnt > 1:
res_sub += char

# if ), subtract from the count        
if char == ')':
paren_cnt -= 1

# capture the script
if char != '(' and char != ')' and paren_cnt >0:
res_sub += char

# capture the alias
if char != '(' and char != ')'  and char != ' ' and paren_cnt == 0:
alias += char

parse(res_sub)

我应该强调的是,我并不精通Python,而且在学习的过程中也学到了很多东西!

感谢

我相信我现在已经解决了这个问题,触发第三个函数的部分触发得太早,并且没有解析代码的子查询。

我已经改变了:

def assess_from_clause(parsed):

full_tables = []
tables = []

from_seen = False
for item in parsed.tokens:
#stop the process if the Where statement is reached
if isinstance(item, Where):
from_seen = False

#checks to see if there is a parenthesis, meaning a subquery 
if 'SELECT' in item.value.upper():
subquery = item.value
subquery_parsing(subquery)

if from_seen:

对此:

def assess_from_clause(parsed):

full_tables = []
tables = []

from_seen = False
for item in parsed.tokens:
#stop the process if the Where statement is reached
if isinstance(item, Where):
from_seen = False

if from_seen:

#checks to see if there is a parenthesis, meaning a subquery 
if 'SELECT' in item.value.upper():
subquery = item.value
subquery_parsing(subquery)

很抱歉,目前对我来说,这是一次试错学习,感谢Barmar的评论。

这对我的库SQLGlot 来说是微不足道的

import sqlglot
import sqlglot.expressions as exp
sql = """select * from luv_main.test_table left join (select * from luv_all.fake_Table where (a = b)) x  where a = 4 order by A, B, C"""
for column in sqlglot.parse_one(sql).find_all(exp.Table):
print(column.text("this"))
fake_Table
test_table

最新更新