Pandas .to_sql()输出超过大小限制



我正在使用pydobc和sqlalchemy在SQL Server中插入数据到表中,并且我得到这个错误。

https://i.stack.imgur.com/miSp9.png

下面是我使用的函数的片段。

这是我用来连接SQL服务器的函数(使用fast_executemany)

def connect(server, database):
global cnxn_str, cnxn, cur, quoted, engine
cnxn_str = ("Driver={SQL Server Native Client 11.0};"
"Server=<server>;"
"Database=<database>;"
"UID=<user>;"
"PWD=<password>;")
cnxn = pyodbc.connect(cnxn_str)
cur = cnxn.cursor()
cur.fast_executemany=True
quoted = quote_plus(cnxn_str)
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted), fast_executemany=True)

这是我用来查询和插入数据到SQL server的函数

def insert_to_sql_server():
global df, np_array

# Dataframe df is from a numpy array dtype = object
df = pd.DataFrame(np_array[1:,],columns=np_array[0])

# add new columns, data processing
df['comp_key'] = df['col1']+"-"+df['col2'].astype(str)
df['comp_key2'] = df['col3']+"-"+df['col4'].astype(str)+"-"+df['col5'].astype(str)
df['comp_statusID'] = df['col6']+"-"+df['col7'].astype(str)
convert_dict = {'col1': 'string', 'col2': 'string', ..., 'col_n': 'string'}

# convert data types of columns from objects to strings
df = df.astype(convert_dict)
connect(<server>, <database>)
cur.rollback()
# Delete old records
cur.execute("DELETE FROM <table>")
cur.commit()
# Insert dataframe to table
df.to_sql(<table name>, engine, index=False, 
if_exists='replace', schema='dbo', chunksize=1000, method='multi')

插入函数在最后返回错误消息之前运行了大约30分钟。

我在使用较小的df大小时没有遇到错误。当前的df大小是27963行和9列。我认为导致错误的一个因素是字符串的长度。默认情况下,numpy数组是dtype='

我不知道,因为它似乎是指的限制熊猫或SQL Server,我不熟悉的错误。

感谢

感谢所有的输入(这里还是新的)!偶然发现了一个解决方案,就是减少df。to_sql从

df.to_sql(chunksize=1000)

df.to_sql(chunksize=200)

挖掘后发现有一个限制从SQL服务器(https://discuss.dizzycoding.com/to_sql-pyodbc-count-field-incorrect-or-syntax-error/)

在我的例子中,我有相同的"输出超过大小限制"错误,我添加了"method='multi'"在df.to_sql(方法= '多')。首先,我尝试了"chuncksize"。解决方案,但没有用。所以…如果你处于相同的情况,请检查一下!

with engine.connect().execution_options(autocommit=True) as conn:
df.to_sql('mytable', con=conn, method='multi', if_exists='replace', index=True)

在向Server表中插入数据时可以显示进度条,可以利用tqdm库。下面是一个例子:

import pandas as pd
from sqlalchemy import create_engine
from tqdm import tqdm
# Define the SQL Server connection string
server = 'your_server_name'
database = 'your_database_name'
table_name = 'your_table_name'
connection_string = f'mssql+pyodbc://@{server}/{database}?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server'
# Create the SQLAlchemy engine
engine = create_engine(connection_string)
# Set the chunk size
chunk_size = 10000  # Adjust this value based on your requirements
# Get the total number of chunks
total_chunks = len(df) // chunk_size + (len(df) % chunk_size > 0)
# Create a progress bar
progress_bar = tqdm(total=total_chunks, desc='Inserting data', unit='chunk')
# Function to generate chunks of the DataFrame
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
# Insert the DataFrame in chunks
for i, chunk in enumerate(chunker(df, chunk_size)):
replace = 'replace' if i == 0 else 'append'
# Check if the chunk contains any new data
if i == 0 or chunk.isin(existing_data).sum().sum() < len(chunk):
# Insert the chunk into the SQL Server table
chunk.to_sql(name=table_name, con=engine, if_exists=replace, index=False)
# Update the progress bar
progress_bar.update()
# Close the progress bar
progress_bar.close()
# Print a message when the insertion is complete
print('Data insertion complete.')

相关内容

最新更新