是否可以使用pd.read_sql和分块来关闭/重新打开连接



假设我有一个很大的表,我想使用chunksize=10000的pd.read_sql。

我现在的做法是:

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('dialect://user:pass@host:port/schema')
with engine.connect() as conn:
for df in pd.read_sql('SELECT * FROM VERY_LARGE_TABLE', con=conn, chunksize=10000):
do stuff

我在这里的问题(以雪花作为数据源(是连接将在";做事";。

有可能做这样的事情吗:

engine = create_engine('dialect://user:pass@host:port/schema', echo=False)
# chunk 1
with engine.connect() as conn:
df = pd.read_sql('SELECT * FROM VERY_LARGE_TABLE', con=conn)
do stuff
# chunk 2
with engine.connect() as conn:
df = pd.read_sql('SELECT * FROM VERY_LARGE_TABLE', con=conn)
do stuff

我现在正在探索的替代方案是在引擎中设置connect_args={"client_session_keep_alive": True}

你能做一些类似的事情吗:

start = 0
chunk = 5000
while True:
with engine.connect() as conn:
query = f'SELECT * FROM VERY_LARGE_TABLE LIMIT {start}, {chunk}'
df = pd.read_sql(query, con=conn)

# if df is empty stop querying
if df.empty: 
break
else:         # increase start for next iteration
start += chunk

do stuff

最新更新