我对python和sqlite都是很新的,所以请忍受我,因为我的方法肯定是相当自动的!我正在尝试使用Python 3.6将多个CSV文件附加到SQLite3数据库中的现有表。我编写的代码将单个CSV文件组合到一个PANDAS数据框架中,然后通过添加/组合/删除列来清除该文件,以使其与SQLite数据框架中的列匹配。然后,它将新的数据框导出为CSV文件。我设法将此新的CSV文件添加到现有数据库中,在数据库中创建一个新表。我想做的是将数据从新表中添加到数据库中的现有表中,因此我尝试使用联合语句,但是它返回以下错误" valueError:参数是不支持的类型"。我知道,当我查看我在数据库中创建的新表时,某些列是类型"真实"而不是文本(尽管在导出CSV之前将它们都转换为'str'(,而表中的所有列我想使用Union类型"文本"一起加入,因此我怀疑这个或联合声明本身就是问题所在,但是我不确定哪个并且不确定如何修复。任何帮助是极大的赞赏!!
导入sqlite3导入操作系统导入大熊猫作为pd导入numpy作为np
def add_co2_files_to_database(files = none(:
# If a list of filepaths isn't specified, use every csv file in the same
# directory as the script
if files is None:
# Get the current directory
cwd = os.getcwd()
#Add every csv file that starts with 'FD_' to a list
files = [cwd+'\'+f for f in os.listdir(cwd) if f.startswith('FD_')]
#Merge the csv files above into single pandas data frame and add a column
for file in files:
df = pd.concat([pd.read_csv(fp).assign(file_name=os.path.basename(fp).split('.')[0]) for fp in files])
#Create a new column called 'FD_serial' from the 'file_name' column
#that consists of just the FD serial number
df['FD_serial'] = df['file_name'].str[0:11]
#Create another column that combines the 'Day', 'Month', and 'Year'
#columns into 1 column called 'date'
df['date'] = df['Month'].map(str)+'-'+df['Day'].map(str)+'-'+df['Year'].map(str)
#Combine columns 'date' and 'Time' into a column called 'date_time'
#then convert column to datetime format
df['date_time'] = pd.to_datetime(df['date'] + ' '+ df['Time'])
#Create new column called 'id' that combines the FD serial number
#'FD_serial' and the timestamp 'date_time' so each line of data has a
#unique identifier in the database
df['id'] = df['FD_serial'].map(str) + '_' + df['date'].map(str) + '_' + df['Time'].map(str)
#Add column 'location' and populate with 'NaN'
df['location'] = np.nan
#Delete unneccesary columns: 'Month', 'Day', 'Year', 'Time', 'date', 'file_name'
df = df.drop(["Month", "Day", "Year", "Time", "date", "file_name", "Mode"], axis=1)
#Rename columns to match the SQLite database conventions
df = df.rename({'Flux':'CO2_flux', 'Temperature (C)':'temp', 'CO2 Soil (ppm) ':'soil_CO2', 'CO2 Soil STD (ppm)':'soil_STD',
'CO2 ATM (ppm)':'atm_CO2', 'CO2 ATM STD (ppm)':'atm_std'}, axis='columns')
#Change data types of all columns to 'str' so it matches the data type in the database
df = df.astype(str)
#Save the merged data frame in a csv file called 'FD_CO2_data.csv'
df.to_csv("FD_CO2_data.csv", index=False)
下面代码部分将上述创建的CSV文件添加到数据库
#Connect to the SQLite Database and create a cursor
conn = sqlite3.connect("email_TEST.db")
cur = conn.cursor()
#Read in the csv file 'FD_CO2_data.csv' that was created above
df = pd.read_csv('FD_CO2_data.csv')
#Add the csv file to the database as a new table
df.to_sql('FD_CO2', conn, if_exists='append', index=False)
#df_db = pd.read_sql_query("select * from FD_CO2 limit 5;", conn)
cur.execute("SELECT id, FD_serial, date_time, CO2_flux, temp, Soil_CO2, soil_STD, atm_CO2, atm_STD, location FROM CO2 UNION SELECT id, FD_serial, date_time, CO2_flux, temp, Soil_CO2, soil_STD, atm_CO2, atm_STD, location FROM FD_CO2", conn)
print(df_db(
add_co2_files_to_database((
将行从新表中插入现有表格应该很容易
cur.execute("INSERT into CO2 select * from FD_CO2")
这假定FD_CO2中的列直接映射到CO2中的列,并且不会有插入冲突,例如重复的键。您将需要一个cur.commit()
才能将行提交到数据库
sqlite中的 UNION
是复合查询,它与数学中的联合基本相同。它返回两个"集合"的联合,即选择。
错误 "ValueError: parameters are of unsupported type"
是因为conn
的CC_5参数。当执行sql语句被参数化时,即执行参数,即进行参数。这是有关该主题的Python文档。