使用Python将数百万JDBC SQL Server表行转换为CSV



所以我的程序出现了一些性能问题。当我运行它时,它几乎每次都会崩溃。最有可能的是,该程序无法同时处理超过一百万行。当我在下面的代码中运行cursor.fetchall()时,就会发生这种情况:

def fill_csv_file(cursor, name):
"""
Fills a csv file with data based on the given SQL table name
"""
try:
use_dataset = "USE canvas"
select_table = "SELECT * FROM " + name
cursor.execute(use_dataset)
cursor.execute(select_table)
LOG.info("Fetching all rows...")
rows = cursor.fetchall()
LOG.info("Mapping " + str(len(rows)) + " SQL rows onto CSV")
if rows:
result = list()
column_names = list()
for i in cursor.description:
column_name = i[0]
column_names.append(column_name)
result.append(column_names)
for row in rows:
result.append(row)
with open(csv_file_path, 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile)
for row in result:
csvwriter.writerow(row)
LOG.info("SQL mapped onto CSV for table name " + name)
else:
LOG.info("No rows found for query: {}".format(select_table))
return
except Exception as e:
LOG.error(e)

除了使用cursor.fetchall(),还有其他方法吗?我尝试将cursor.arraysize更改为5000,但仍然存在内存问题。

我也尝试过这样的东西,使用OUTFILE

def fill_csv_file(cursor, name):
"""
Fills a csv file with data based on the SQL table needed
"""
try:
LOG.info("Filling CSV with SQL data for " + name + "...")
use_dataset = "USE canvas"
outfile = 'SELECT * INTO OUTFILE "' + csv_file_path + '" FROM ' + name +";"
print(outfile)
cursor.execute(use_dataset)
cursor.execute(outfile)
LOG.info("Done")
except Exception as e:
LOG.error(e)

但是我不使用MYSQL,所以这将不起作用。

我之所以不手动这样做,是因为我正在尝试翻译几十个SQL表,而我正在使用的项目需要Python来自动化这个项目。任何建议都会派上用场!

编辑更多代码:

def connect_db():
"""
Connects to the SQL Server database using JayDeBeAPI
"""
try:
LOG.info("Connecting to database...")
driver_name = creds.driver_jdbc
connection_url = creds.db_server_url
connection_properties = {
"user": creds.uuid,
"password": creds.password,
}
jar_path = r"C:UsersSabriDocumentsGitHubcsv_to_redivisjtds-1.3.1.jar"
connection = jaydebeapi.connect(
driver_name, connection_url, connection_properties, jar_path)
LOG.info("Connection to DB " + driver_name +
" of connection " + connection_url + " successful.")
return connection
except Exception as e:
LOG.error(e)
def one_csv():
try:
connection = connect_db()
cursor = connection.cursor()
cursor.arraysize = 5000
fill_csv_file(cursor, 'assignment_group_score_dim')
except Exception as e:
LOG.error(e)

根据我们的讨论,稍微修改代码以使用cursor.fetchone()而不是cursor.fetchall()可以防止程序崩溃,但由于通过JVM进行反序列化,速度会很慢。

def fill_csv_file(cursor, name):
"""
Fills a csv file with data based on the given SQL table name
"""
try:
use_dataset = "USE canvas"
select_table = "SELECT * FROM " + name
cursor.execute(use_dataset)
cursor.execute(select_table)
LOG.info("Fetching a single row...")
row = cursor.fetchone()
LOG.info("Mapping SQL rows onto CSV")
if row:
column_names = []
for i in cursor.description:
column_name = i[0]
column_names.append(column_name)
else:
LOG.info("No rows found for query: {}".format(select_table))
return
with open(csv_file_path, 'w', newline='') as csvfile:
csvwriter = csv.writer(csvfile)
csvwriter.writerow(column_names)
while row is not None:
csvwriter.writerow(row)
row = cursor.fetchone()
LOG.info("SQL mapped onto CSV for table name " + name)
except Exception as e:
LOG.error(e)

最新更新