如何优化500万raw的游标提取



我从MSSQL获得了一个表,有5M行,当我获取这个表的所有行时,这需要我2~3分钟。我想(如果可能的话)优化它。

这是我的代码:

cursor.execute("SELECT * FROM MyTable")
rows = cursor.fetchall() # that takes 2~3 minutes
# some code for setup the output that take only few seconds

I already tried, to used:

while True:
rows = cursor.fetchmany(500000)

if not rows: 
break
# Do some stuff

还有fetchone

但我再次在2-3分钟之间:/如何优化?也许使用线程,但我不知道如何。

谢谢你的帮助。

我认为您可以限制查询返回的行数,即使您必须对数据库进行多次调用。

关于线程,有几种解决方案:
  1. 单个连接,但每个线程使用不同的游标
  2. 为每个线程创建一个连接,并从该连接中创建一个游标

在任何情况下你都需要一个ThreadedConnectionPool。下面是其中一种方法的小示例

import psycopg2
from psycopg2 import pool
from threading import Thread
from time import sleep
threaded_connection_pool = None
thread_table = list()
def get_new_connection():
global threaded_postgreSQL_pool
connection = None
while not isinstance(connection, psycopg2.extensions.connection):
try:
connection = threaded_postgreSQL_pool.getconn()
except pool.PoolError:
sleep(10)  # Wait a free connection
return connection, connection.cursor()

def thread_target():
connection, cursor = get_new_connection()
with connection, cursor:
# Do some stuff
pass

threaded_connection_pool = psycopg2.pool.ThreadedConnectionPool(
# YOUR PARAM
)

for counter_thread in range(10):
thread = Thread(
target=thread_target,
name=f"Thread n°{counter_thread}"
)
thread_table.append(thread)
thread.start()
#
# Do many more stuff
#

for thread in thread_table:
thread.join()
# End

我更喜欢使用第一种解决方案"单个连接,但每个线程使用不同的光标">

For that: I have to do something like that ?

result = []
cursor = connection.cursor()
def fetch_cursor(cursor):
global result
rows = cursor.fetchall()
if rows:
result += beautify_output(rows)
######### THIS CODE BELOW IS INSIDE A FUNCTION ######
thread_table = []
limit = 1000000
offset = 0
sql = "SELECT * FROM myTABLE"
while True:
try:
cursor.execute(f"{sql} LIMIT {limit} OFFSET {offset}")
except Exception as e:
break

offset += limit
thread = Thread(target=fetch_cursor, args=(cursor,))
thread_table.append(thread)
thread.start()
for thread in thread_table:
thread.join()
print(result) 

这样的东西应该可以工作吗?(我明天再试试)

最新更新