如何加快将 CSV 中的行插入 mysql 表的 for 循环?



我在做什么:

  • 我正在从 mysql 表执行查询并将每天的数据导出到一个文件夹中
  • 然后,我使用 for 循环逐行将每个 csv 插入到单独的 mysql 表中
  • 加载到表中后,我将 csv 移动到另一个单独的文件夹中

问题是运行需要很长时间,并希望得到一些帮助,以找出我可以加快该过程的领域或对 Python 中替代方法的建议。

法典:

import pymysql
import pymysql.cursors
import csv
import os
import shutil
import datetime
from db_credentials import db1_config, db2...
def date_range(start, end):
# Creates a list of dates from start to end
...
def export_csv(filename, data):
# Exports query result as a csv to the filename's pending folder
...
def extract_from_db(database, sql, start_date, end_date, filename):
# SQL query to extract data and export as csv
...
def open_csv(c):
# Read csv and return as a list of lists
...
def get_files(folder):
# Grab all csv files from a given folder's pending folder
...
# HERE IS WHERE IT GETS SLOW
def load_to_db(table, folder):
print('Uploading...n')
files = get_files(folder)
# Connect to db2 database
connection = pymysql.connect(**db2_config, charset='utf8mb4', cursorclass=pymysql.cursors.DictCursor)
try:
with connection.cursor() as cursor:
# Open each csv in the files list and ignore column headers
for file in files:                
print('Processing ' + file.split("pending/",1)[1] + '...', end='')
csv_file = open_csv(file)
csv_headers = ', '.join(csv_file[0])
csv_data = csv_file[1:]
# Insert each row of each csv into db2 table
for row in csv_data:
placeholders = ', '.join(['%s'] * len(row))
sql = "INSERT INTO %s (%s) VALUES ( %s )" % (table, csv_headers, placeholders)
cursor.execute(sql, row)
# Move processed file to the processed folder
destination_folder = os.path.join('/Users','python', folder, 'processed')
shutil.move(file, destination_folder)
print('DONE')
# Connection is not autocommit by default.
# So you must commit to save your changes.
connection.commit()
finally:
connection.close()
if not files:
print('No csv data available to process')
else:
print('Finished')

如何尝试 mysql 加载数据

例如,对整个 CSV 而不是单个插入执行以下语句

LOAD DATA INFILE '<your filename>' 
INTO TABLE <your table> 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"'
LINES TERMINATED BY 'n'
IGNORE 1 ROWS;

最新更新