Python将数据保存到PostgreSQL中:数组值错误

我正在努力学习如何将在panda中创建的数据帧保存到postgresqldb(托管在Azure上(中。我计划从简单的伪数据开始：

data = {'a':  ['x', 'y'],
'b': ['z', 'p'],
'c': [3, 5]
}
df = pd.DataFrame (data, columns = ['a','b','c'])

我发现了一个函数，它将df数据推送到psql表中。它从定义连接开始：

def connect(params_dic):
""" Connect to the PostgreSQL database server """
conn = None
try:
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(**params_dic)
except (Exception, psycopg2.DatabaseError) as error:
print(error)
sys.exit(1) 
print("Connection successful")
return conn
conn = connect(param_dic)

*param_dic包含所有连接详细信息(user/pass/host/db(一旦建立了连接，我就定义执行函数：

def execute_many(conn, df, table):
"""
Using cursor.executemany() to insert the dataframe
"""
# Create a list of tupples from the dataframe values
tuples = [tuple(x) for x in df.to_numpy()]
# Comma-separated dataframe columns
cols = ','.join(list(df.columns))
# SQL quert to execute
query  = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)
cursor = conn.cursor()
try:
cursor.executemany(query, tuples)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
cursor.close()
return 1
print("execute_many() done")
cursor.close()

我对我在DB:中创建的psql表执行了这个函数

execute_many(conn,df,"raw_data.test")

表raw_data.test由a(char[](、b(char[]](、c(数字(列组成。当我运行代码时，我在控制台中得到以下信息：

Connecting to the PostgreSQL database...
Connection successful
Error: malformed array literal: "x"
LINE 1: INSERT INTO raw_data.test(a,b,c) VALUES('x','z',3)
^
DETAIL:  Array value must start with "{" or dimension information.

我不知道如何解释它，因为df中没有一列是数组

df.dtypes
Out[185]: 
a    object
b    object
c     int64
dtype: object

有什么问题吗？或者建议如何以更简单的方式在pSQL中保存df？我发现了很多使用sqlalchemy以以下方式创建连接字符串的解决方案：

conn_string = 'postgres://user:password@host/database'

但我不确定这是否适用于云数据库——如果我试图用azure主机的详细信息编辑这样的连接字符串，它就不起作用。

PostgreSQL中字符串的常用数据类型为TEXT或VARCHAR(n)或CHAR(n)，带圆括号；而不是带有方括号的CCD_ 4。

我猜您希望列包含一个字符串，而CHAR[]是一个拼写错误；在这种情况下，您需要重新创建(或迁移(表列到正确的类型——很可能是TEXT。

(对于固定长度的数据，如果它是真正的固定长度，则可以使用CHAR(n)；VARCHAR(n)主要是历史感兴趣的。在大多数情况下，使用TEXT。(
或者，如果您确实想使列成为一个数组，则需要从Python传递该位置的列表。

考虑调整参数化方法，因为psycopg2支持更优化的方法来格式化SQL语句中的标识符，如表或列名。

事实上，文档表明您当前的方法不是最佳的，并且会带来安全风险：

# This works, but it is not optimal
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s)" % (table, cols)

相反，使用psycop2.sql模块：

from psycopg2 import sql 
...
query = (
sql.SQL("insert into {} values (%s, %s, %s)") 
.format(sql.Identifier('table'))
)
...
cur.executemany(query, tuples)

此外，对于SQL中的最佳实践，始终在追加查询中包含列名，而不依赖于存储表的列顺序：

query = (
sql.SQL("insert into {0} ({1}, {2}, {3}) values (%s, %s, %s)") 
.format(
sql.Identifier('table'), 
sql.Identifier('col1'),
sql.Identifier('col2'), 
sql.Identifier('col3')
)
)

最后，停止在所有Python代码(而不仅仅是psycopg2(中使用%进行字符串格式化。从Python 3开始，此方法已被取消强调，但尚未被弃用！相反，使用str.format(Python 2.6+(或F字符串(Python 3.6+(。

相关内容

最新更新

热门标签：