SQLLalchemy orm 批量插入时从熊猫数据帧中 NP.NAN 时



我正在使用sqlalchemy ORM工具将Pandas DataFrame批量插入到Microsoft SQL Server DB中:

my_engine = create_engine(url.URL(**my_db_url))
Session = sessionmaker(bind=my_engine )
my_session = Session()
start = time.time()
my_session.bulk_insert_mappings(TableObject, mysample)
my_session.commit()
durata = time.time() -start
my_session.close()

这里mysample是创建为的字典列表:

mysample=myDataFrame.to_dict(orient='records')

适合 TableObject,声明如下:

from sqlalchemy import Column, BigInteger, String, Integer, Sequence, DateTime,Date, Float, ForeignKey, Boolean, VARCHAR, MetaData
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.schema import PrimaryKeyConstraint
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import null
Base = declarative_base()
class TableObject(Base):
__tablename__ = 'mytable'
__table_args__ = {"schema": "dbo"}
Key1= Column('Key1',String(1), nullable=False)
Key2= Column('Key2',Integer, nullable=False)
Key3= Column('Key3',Integer, nullable=False)
Key4= Column('Key4',BigInteger, nullable=False)
SCORE_DATE= Column('SCORE_DATE',DateTime)
ScoreVal= Column("ScoreVal",Float)
__table_args__ = (
PrimaryKeyConstraint(
Key1, Key2,Key3,Key4
), {}
)

ScoreVal可能很少是np.nan。大容量插入数据帧的最佳方法是什么?

正如在 SO 上的某处发现的那样,在使用批量插入之前,可能需要将np.nan替换为None

mysample= mysample.replace({np.nan: None})

这适用于MSSSQL SERVER和ORACLE。

相关内容

  • 没有找到相关文章

最新更新