从python行列表缓慢插入Azure SQL数据库

大家好，我正在努力解决这个问题：我正试图在azure db中插入一个由大约100k行组成的python列表，使用以下代码：

list_of_rows = [...]
self.azure_cursor.fast_executemany = True
self.azure_cursor.executemany('''INSERT INTO table_name VALUES(?,?,?,?,?,?,?,?,?,?,?)''',list_of_rows)

问题是，这样做需要很长时间(对于100k行，对于小于30MB的数据量，大约43秒(，我不知道如何改进它，因为我已经在使用fast_executemany，从azure仪表板上可以看出，我没有达到订阅计划授予的最大DTU(S1-20 DTU(。我也试着看看索引是否会有所帮助，但没有任何好处(建议尝试在SSMS中运行查询，不使用索引(。最后，问题不在于连接，因为我使用的是1Gb/s下载/上传

有人知道如何提高这些表现吗？

更新

尝试使用Shiraz Bhaiji链接的页面中建议的以下代码：

首先，我从我的行列表中创建一个pandas数据帧，然后设置引擎并创建事件侦听器，然后使用df.to_sql

self.df = pd.DataFrame(data = list_of_rows , columns=['A','B','C'])
params='DRIVER=driver;SERVER=server;PORT=1433;DATABASE=databas;UID=username;PWD=password'
db_params = urllib.parse.quote_plus(params)
self.engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect={}".format(db_params))
@event.listens_for(self.engine, "before_cursor_execute")
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
df.to_sql('table_name', self.engine, index=False, if_exists="append", schema="dbo")

下面的代码与纯执行所需的时间相同。我试图删除PK(表上没有其他索引(，它使插入速度更快，现在需要22秒，但对于100k行来说，对于总共30 MB的数据来说太多了

如果使用to_sql函数，可以加快插入速度。

请参阅：https://medium.com/analytics-vidhya/speed-up-bulk-inserts-to-sql-db-using-pandas-and-python-61707ae41990

相关内容

最新更新

热门标签：