在之前的程序中,我从csv文件中读取数据,如下所示:
AllData = np.genfromtxt(open("PSECSkew.csv", "rb"),
delimiter=',',
dtype=[('CalibrationDate', datetime),('Expiry', datetime), ('B0', float), ('B1', float), ('B2', float), ('ATMAdjustment', float)],
converters={0: ConvertToDate, 1: ConvertToDate})
我现在正在编写一个令人难以置信的类似程序,但这次我想获得一个非常类似的数据结构,以AllData
(除了浮点数将全部在csv字符串这次),但从SQL Server而不是csv文件。最好的方法是什么?
pyodbc看起来涉及到很多我不熟悉的光标,希望避免使用。我只是想运行查询,并在上面的结构(或像c#中的DataTable)中获取数据。
下面是一个最小的例子,基于你链接到的另一个问题:
import pyodbc
import numpy
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=MyServer;Trusted_Connection=yes;')
cur = conn.cursor()
cur.execute('select object_id from sys.objects')
results = cur.fetchall()
results_as_list = [i[0] for i in results]
array = numpy.fromiter(results_as_list, dtype=numpy.int32)
print array
与此同时,还有更好的办法。查看turbodbc包。要将结果集转换为NumPy数组的OrderedDict,只需执行以下操作:
import turbodbc
connection = turbodbc.connect(dsn="My data source name")
cursor = connection.cursor()
cursor.execute("SELECT 42")
results = cursor.fetchallnumpy()
它也应该比pyodbc快得多(取决于你的数据库,10倍是绝对可能的)。
用熊猫怎么样?例如:
import psycopg2
import pandas
try :
con = psycopg2.connect(
host = "host",
database = "innovate",
user = "username",
password = "password")
except:
print "Could not connect to database."
data = pandas.read_sql_query("SELECT * FROM table", con)
最后,我只是使用pyodbc并遍历游标/结果集,通过大量的试验和错误将每个结果放入手动构建的结构化数组中。如果有更直接的方法,我洗耳恭听!
import numpy as np
import pyodbc as SQL
from datetime import datetime
cxn = SQL.connect('Driver={SQL Server};Server=myServer; Database=myDB; UID=myUserName; PWD=myPassword')
c = cxn.cursor()
#Work out how many rows the query returns in order to initialise the structured array with the correct number of rows
num_rows = c.execute('SELECT count(*) FROM PSECSkew').fetchone()[0]
#Create the structured array
AllData = np.zeros(num_rows, dtype=[('CalibrationDate', datetime),('Expiry', datetime), ('B0', float), ('B1', float), ('B2', float), ('ATMAdjustment', float)])
ConvertToDate = lambda s:datetime.strptime(s,"%Y-%m-%d")
#iterate using the cursor and fill the structred array.
r = 0
for row in c.execute('SELECT * FROM PSECSkew ORDER BY CalibrationDate, Expiry'):
AllData[r] = (ConvertToDate(row[0]), ConvertToDate(row[1])) + row[2:] #Note if you don't need manipulate the data (i.e. to convert the dates in my case) then just tuple(row) would have sufficed
r = r + 1