每当我想将pandas数据帧中的数据插入postgresql数据库时,我都会收到这个错误error: extra data after last expected column CONTEXT: COPY recommendations, line 1: "0,4070,"[5963, 8257, 9974, 7546, 11251, 5203, 102888, 8098, 101198, 10950]""
数据帧由三列组成,第一列和第二列为整数类型,第三列为整数列表。
我在PostgreSQL中使用下面的这个函数创建了一个表
def create_table(query: str) -> None:
"""
:param query: A string of the query to create table in the database
:return: None
"""
try:
logger.info("Creating the table in the database")
conn = psycopg2.connect(host=HOST, dbname=DATABASE_NAME, user=USER, password=PASSWORD, port=PORT)
cur = conn.cursor()
cur.execute(query)
conn.commit()
logger.info("Successfully created a table in the database using this query {}".format(query))
return
except (Exception, psycopg2.Error) as e:
logger.error("An error occurred while creating a table using the query {} with exception {}".format(query, e))
finally:
if conn is not None:
conn.close()
logger.info("Connection closed!")
传递到此函数的查询如下:
create_table_query = '''CREATE TABLE Recommendations
(id INT NOT NULL,
applicantId INT NOT NULL,
recommendation INTEGER[],
PRIMARY KEY(id),
CONSTRAINT applicantId
FOREIGN KEY(applicantId)
REFERENCES public."Applicant"(id)
ON DELETE CASCADE
ON UPDATE CASCADE
); '''
然后,我使用下面的函数将数据帧复制到postgres中创建的表中。
def copy_from_file(df: pd.DataFrame, table: str = "recommendations") -> None:
"""
Here we are going save the dataframe on disk as
a csv file, load the csv file
and use copy_from() to copy it to the table
"""
conn = psycopg2.connect(host=HOST, dbname=DATABASE_NAME, user=USER, password=PASSWORD, port=PORT)
# Save the dataframe to disk
tmp_df = "./tmp_dataframe.csv"
df.to_csv(tmp_df, index_label='id', header=False)
f = open(tmp_df, 'r')
cursor = conn.cursor()
try:
cursor.copy_from(f, table, sep=",")
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
os.remove(tmp_df)
logger.error("Error: %s" % error)
conn.rollback()
cursor.close()
logger.info("copy_from_file() done")
cursor.close()
os.remove(tmp_df)
然后我仍然收到这个error: extra data after last expected column CONTEXT: COPY recommendations, line 1: "0,4070,"[5963, 8257, 9974, 7546, 11251, 5203, 102888, 8098, 101198, 10950]""
,请就如何解决这个问题提出任何建议?感谢
copy_from
使用文本格式,而不是csv格式。您告诉它使用,
作为分隔符,但这并没有改变它试图使用的保护方法。因此,引号中的逗号不被视为受保护,它们被视为字段分隔符,当然它们太多了。
我认为您需要使用copy_expert
,并告诉它使用csv
格式。