使用pymssql将CSV转换为MSSQL



动机是在我的CSV中不断寻找新记录,并使用pymssql库将记录插入mssql。CSV最初有244行,我试图插入1个值,并且希望只有在使用调度程序运行脚本时才动态插入新行。我有一个每15秒运行一次的脚本来插入值,但在第一次插入值后,第二次脚本抛出"Cannot insert duplicate key in object",因为我的第一列DateID设置了PK,并终止了第一条记录本身的语句,因此不插入新行。

我怎么会遇到这种情况。

代码:

def trial():
try: 
for row in df.itertuples():

datevalue = datetime.datetime.strptime(row.OrderDate, format)
query= "INSERT INTO data (OrderDate, Region, City, Category) VALUES (%s,%s,%s,%s)"
cursor.execute(query, (datevalue, row.Region,row.City,row.Category))
print('"Values inserted')
conn.commit()
conn.close()
except Exception as e:
print("Handle error", e)
pass

schedule.every(15).seconds.do(trial)

使用的库:pymssqlSQL:MSSQL服务器2019

为了避免重复的值,请考虑将查询调整为针对实际数据使用EXCEPT子句(UNIONINTERSECT集合运算符族的一部分(。此外,还可以考虑使用executemany,方法是将所有行/列数据的嵌套列表与DataFrame.to_numpy().tolist()一起传递。

顺便说一句,如果OrderDate列在数据帧和数据库表中是datetime类型,则不需要重新格式化为字符串值。

def trial():
try: 
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT %s, %s, %s, %s "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.executemany(query, vals)
print('Values inserted')
conn.commit()
except Exception as e:
print("Handle error", e)
finally:
cur.close()
conn.close()

对于更快的大容量插入,可以考虑使用临时表:

# CREATE EMPTY TEMP TABLE 
query = "SELECT TOP 0 OrderDate, Region, City, Category INTO #pydata FROM data"
cur.execute(query)
# INSERT INTO TEMP TABLE
query= (
"INSERT INTO #pydata (OrderDate, Region, City, Category) "
"VALUES (%s, %s, %s, %s) "
)
vals = df[["OrderDate", "Region", "City", "Category"]].to_numpy()
vals = tuple(map(tuple, vals))
cur.execute("BEGIN TRAN")
cur.executemany(query, vals)
# MIGRATE TO FINAL TABLE
query= (
"INSERT INTO data (OrderDate, Region, City, Category) "
"SELECT OrderDate, Region, City, Category "
"FROM #pydata "
"EXCEPT "
"SELECT OrderDate, Region, City, Category "
"FROM data"
)
cur.execute(query)
conn.commit()
print("Values inserted")

最新更新