使用从pyodbc执行许多数据帧到SQL Server

我正试图使用Pyodbc将数据从数据帧加载到SQL Server，Pyodbc逐行插入，速度非常慢。

我已经尝试了两种在线(中等(方法，但我没有发现任何性能改进。

尝试在SQL azure中运行，因此SQL Alchemy不是一个简单的连接方法。请找到我遵循的方法，还有其他方法可以提高散装货物的性能吗。

方法1

cursor = sql_con.cursor()
cursor.fast_executemany = True
for row_count in range(0, df.shape[0]):
chunk = df.iloc[row_count:row_count + 1,:].values.tolist()
tuple_of_tuples = tuple(tuple(x) for x in chunk)
for index,row in ProductInventory.iterrows():
cursor.executemany("INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples)

方法2

cursor = sql_con.cursor() 
for row_count in range(0, ProductInventory.shape[0]):
chunk = ProductInventory.iloc[row_count:row_count + 1,:].values.tolist()
tuple_of_tuples = tuple(tuple(x) for x in chunk)
for index,row in ProductInventory.iterrows():
cursor.executemany(""INSERT INTO table ([x]],[Y]) values (?,?)",tuple_of_tuples

有人能告诉我为什么性能甚至没有提高1%吗？仍然需要同样的时间

尝试在SQL azure中运行，因此SQL Alchemy不是一种简单的连接方法。

也许你只需要先跨过这个障碍。然后可以将pandasto_sql与fast_executemany=True一起使用。例如

from sqlalchemy import create_engine
#
# ...
#
engine = create_engine(connection_uri, fast_executemany=True)
df.to_sql("table_name", engine, if_exists="append", index=False)

如果你有一个正在工作的pyodbc连接字符串，你可以将其转换为SQLAlchemy的连接URI如下：

connection_uri = 'mssql+pyodbc:///?odbc_connect=' + urllib.parse.quote_plus(connection_string)

为什么要对ProductInventory进行两次迭代？
executemany调用不应该在构建了整个元组或一批元组之后发生吗？
pyodbc文档中指出，"在fast_executemy=False的情况下运行executemany((通常不会比直接运行多个execute((命令快多少。"因此，您需要在这两个示例中设置cursor.fast_executemany=True(请参阅https://github.com/mkleehammer/pyodbc/wiki/Cursor了解更多细节/示例(。我不知道为什么在例子2中省略了它。

下面是一个如何完成我认为您正在尝试做的事情的示例。math.ceil和end_idx = ...中的条件表达式占最后一批，可能是奇数大小。因此，在下面的示例中，您有10行，批大小为3，因此最终得到4个批，最后一个只有1个元组。

import math
df = ProductInventory
batch_size = 500
num_batches = math.ceil(len(df)/batch_size)
for i in range(num_batches):
start_idx = i * batch_size
end_idx = len(df) if i + 1 == num_batches else start_idx + batch_size
tuple_of_tuples = tuple(tuple(x) for x in df.iloc[start_idx:end_idx, :].values.tolist())       
cursor.executemany("INSERT INTO table ([x]],[Y]) values (?,?)", tuple_of_tuples)

输出示例：

=== Executing: ===
df = pd.DataFrame({'a': range(1,11), 'b': range(101,111)})
batch_size = 3
num_batches = math.ceil(len(df)/batch_size)
for i in range(num_batches):
start_idx = i * batch_size
end_idx = len(df) if i + 1 == num_batches else start_idx + batch_size
tuple_of_tuples = tuple(tuple(x) for x in df.iloc[start_idx:end_idx, :].values.tolist())
print(tuple_of_tuples)
=== Output: ===
((1, 101), (2, 102), (3, 103))
((4, 104), (5, 105), (6, 106))
((7, 107), (8, 108), (9, 109))
((10, 110),)

相关内容

最新更新

热门标签：