同一个 SQL Server 临时表是否可以由多个独立执行的 Python 脚本持久化和使用，而无需每次都重新创建?

我正在从SQL Server数据库中获取数据并将它们保存到文件中，以便在Python中进行后续处理。

我正在使用 Make 来自动获取和重新获取数据(如果某些设置发生更改，则仅重新运行查询的受影响部分，而不是全部)。所以我有一个简单的制作文件，如下所示：

rawdata: datafile1.h5, datafile2.h5 # ... more files like this
datafile1.h5: data1_specific_config.py, common_config.py
python fetch_data1.py
datafile2.h5: data2_specific_config.py, common_config.py
python fetch_data2.py
# ... similar rules for other files

当需要时，我只是运行make rawdata.

现在，脚本执行的所有 SQL 查询fetch_dataN.py都有一个重要的公共部分。示意性地，由fetch_dataN.py运行的queryN如下所示：

select ... into ##common_tmp_table ... /*this is identical for all queries*/
select ... from (... ##common_tmp_table ...) /*this is queryN specific; but the same ##common_tmp_table is used*/

问题是：当我现在在需要重建五个不同数据文件的情况下运行make rawdata时，相同的查询select ... into ##common_tmp_table ...运行五次，并将相同的输出输出到##common_tmp_table中。查询需要相当长的时间才能运行，因此重新执行它五次会显着减慢所有内容。

但是，当一个脚本完成时，临时表总是fetch_dataN.py删除，因为创建它的数据库连接已终止。

问题：

有没有办法强制表##common_tmp_table只创建一次并在make rawdata执行的所有脚本fetch_dataN.py之间持久化？

特别是，有没有办法在make rawdata运行的所有脚本中使用相同的数据库连接？或者，也许打开一个额外的连接，该连接将在所有脚本运行时持续存在，并且可以防止删除全局临时表？

我知道的解决方法：我可以通过在运行make rawdata之前手动创建##common_tmp_table(例如在MS SQL Server Management Studio中)来解决此问题，并保持连接用于此打开状态，直到所有脚本完成。但这显然是丑陋和烦人的。

如果make rawdata可以打开一个单独的进程来打开连接，创建 tmp 表并继续等待，直到其他所有内容完成，那将是一个解决方案。但我不知道这是否可能。

局限性：

我无法在数据库中进行更改(例如创建永久表而不是临时表)
我需要脚本保持独立，以便它们可以通过 make 独立执行(将所有内容放在具有相同数据库连接的脚本中，因此相同的 tmp 表无济于事 - 每当需要重新获取其中一两个数据文件时重建所有数据文件会更慢)

笔记：

MS SQL Server 2008 R2
pyodbc 4.0.28(用于连接到数据库)
python 3.7.6
make 4.3
conda 4.7.12

谢谢。

所以我找到了一个非常好的解决方案：这个想法是让make rawdata执行一个python脚本

打开

数据库连接并将其保持打开状态
创建##common_tmp_table
运行make rawdata_，负责重建数据文件(与make rawdata问题中发布的代码相同，但现在查询中没有select ... into ##common_tmp_table ...)
关闭连接

在代码中：

制作文件：

#THIS IS NEW
.PHONY rawdata # to always rebuild rawdata target
rawdata:
python fetch_all_non_uptodate.py # just call a script that (among other stuff) runs `make rawdata_`
#THE REST IS AS BEFORE (just added underscore)
rawdata_: datafile1.h5, datafile2.h5 # ... more files like this
datafile1.h5: data1_specific_config.py, common_config.py
python fetch_data1.py
datafile2.h5: data2_specific_config.py, common_config.py
python fetch_data2.py
# ... similar rules for other files

fetch_all_non_uptodate.py：

import subprocess
import pyodbc
conn = pyodbc.connect(...) #open db connection
# simulate the run of make with the -q flag to find out whether all the datafiles are up-to-date (return code 0) or not (return code 1); nothing is re-fetched as yet
uptodate = (subprocess.run(['make', '-q', 'rawdata_']).returncode == 0)
# if the raw datafiles are not up-to-date
if not uptodate:    
create_common_tmp_table(conn) # create the ##common_tmp_table in the db and keep it while conn is open
conn.commit() #commit the creation of the tmp table (Important! - otherwise the other connections won't see it!)
subprocess.run(['make', 'rawdata_']) # run make to re-fetch whatever datafiles need to be re-fetched
# the queries can make use of the existing tmp table
# otherwise we just simulate the make output telling that all is up-to-date
else:
print("make: Nothing to be done for 'rawdata'.")
conn.close()

查询N：

/*keep just the specific part - the ##common_tmp_table already exists*/
select ... from (... ##common_tmp_table ...)

相关内容

最新更新

热门标签：