我尝试使用Python将CSV文件写入SQL Server数据库中的表。当我通过参数时,我会面临错误,但是当我手动执行此操作时,我不会遇到任何错误。这是我正在执行的代码。
cur=cnxn.cursor() # Get the cursor
csv_data = csv.reader(file(Samplefile.csv')) # Read the csv
for rows in csv_data: # Iterate through csv
cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)",rows)
cnxn.commit()
错误:
pyodbc.dataerror :('22001','[22001] [Microsoft] [ODBC SQL Server驱动程序] [SQL Server] [SQL Server]字符串或二进制数据将被截断。(8152)(SQLEXECECDIRECTW)[ODBC SQL Server驱动程序] [SQL Server]该语句已终止。(3621)')
但是,当我手动插入值时。它可以正常工作
cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)",'A','B','C','D')
我已经确保该表在数据库中,数据类型与我正在传递的数据一致。连接和光标也是正确的。行的数据类型是"列表"
考虑动态构建查询,以确保占位符的数量与您的表和CSV文件格式匹配。然后,这只是确保表和CSV文件正确的问题,而不是检查您在代码中输入足够的?
占位符。
以下示例假设
- CSV文件在第一行中包含列名
- 连接已经建立
- 文件名是
test.csv
- 表名是
MyTable
- python 3
...
with open ('test.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
query = 'insert into MyTable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
cursor = connection.cursor()
for data in reader:
cursor.execute(query, data)
cursor.commit()
如果文件中未包含列名:
...
with open ('test.csv', 'r') as f:
reader = csv.reader(f)
data = next(reader)
query = 'insert into MyTable values ({0})'
query = query.format(','.join('?' * len(data)))
cursor = connection.cursor()
cursor.execute(query, data)
for data in reader:
cursor.execute(query, data)
cursor.commit()
我修改了上面由布莱恩(Brian)编写的代码,因为上面发布的一个无法在我试图上传的界定文件上使用。该行row.pop()
也可以忽略,因为仅对于我试图上传的一组文件是必要的。
import csv
def upload_table(path, filename, delim, cursor):
"""
Function to upload flat file to sqlserver
"""
tbl = filename.split('.')[0]
cnt = 0
with open (path + filename, 'r') as f:
reader = csv.reader(f, delimiter=delim)
for row in reader:
row.pop() # can be commented out
row = ['NULL' if val == '' else val for val in row]
row = [x.replace("'", "''") for x in row]
out = "'" + "', '".join(str(item) for item in row) + "'"
out = out.replace("'NULL'", 'NULL')
query = "INSERT INTO " + tbl + " VALUES (" + out + ")"
cursor.execute(query)
cnt = cnt + 1
if cnt % 10000 == 0:
cursor.commit()
cursor.commit()
print("Uploaded " + str(cnt) + " rows into table " + tbl + ".")
您可以将列作为参数传递。例如:
for rows in csv_data: # Iterate through csv
cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)", *rows)
如果您在气流中使用mysqlhook,如果cursor.execute()带有params throw san error
TypeError:并非所有在字符串格式化期间转换的参数
使用%s
代替?
with open('/usr/local/airflow/files/ifsc_details.csv','r') as csv_file:
csv_reader = csv.reader(csv_file)
columns = next(csv_reader)
query = '''insert into ifsc_details({0}) values({1});'''
query = query.format(','.join(columns), ','.join(['%s'] * len(columns)))
mysql = MySqlHook(mysql_conn_id='local_mysql')
conn = mysql.get_conn()
cursor = conn.cursor()
for data in csv_reader:
cursor.execute(query, data)
cursor.commit()
我对其进行了整理。错误是由于表的尺寸限制限制。它更改了列的容量,例如从col1 varchar(10)到col1 varchar(35)等。现在正常工作。
这是脚本,希望这对您有用:
import pandas as pd
import pyodbc as pc
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')
query = 'insert into TableName({0}) values ({1})'
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))
cur.fast_executemany = True
cur.executemany(query, df.values.tolist())
cnxn.close()
您也可以使用以下方式将数据导入SQL:
- SQL Server导入和导出向导
- SQL Server Integration Services(SSIS)
- OpenRowset函数
可以在此网页上找到更多详细信息:https://learn.microsoft.com/en-us/sql/relational-database/import-export/import-port/import-data-from-excel-to-sql?view=sql-sql-server-2017