通过PYODBC迭代SQL查询,并将结果收集到Pandas DF中



我正试图对日期范围内的所有日期反复运行查询,并将结果收集到每次迭代的Pandas DF中。

我建立了一个连接(PYODBC(,并创建了一个日期列表,我希望通过SQL查询运行该列表,以聚合到DF中。我确认了日期是一份清单。

link = pyodbc.connect(  Connection Details     )
date = [d.strftime('%Y-%m-%d') for d in pd.date_range('2020-10-01','2020-10-02')]
type(date)

我创建了一个空DF来收集SQL查询每次迭代的结果,并检查了结构。

empty = pd.DataFrame(columns = ['Date', 'Balance']) 
empty

我将查询设置为:

sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt =  '{}'
group by "Date";
""".format(day)

我尝试了下面的for循环,希望将每个查询执行的结果聚合到空的df中,但我得到了一个空的df。

for day in date:
a = (pd.read_sql_query(sql, link))
empty.append(a)

如果问题与SQL设置和/或for循环有关,有什么想法吗?一个更好更有效的方法来解决这个问题?

通过将Date添加为GROUP BY列并将开始日期和结束日期作为筛选参数来避免循环并运行单个SQL查询。并使用首选的参数化方法,而不是pandas.read_sql支持的字符串格式:

# PREPARED STATEMENT WITH ? PLACEHOLDERS
sql = """SALES dt AS "Date"
, SUM(BAL)/1000 AS "Balance"
FROM sales
WHERE item IN (1,2,3,4)
AND dt BETWEEN  ? AND ?
GROUP BY dt;
"""
# BIND PARAMS TO QUERY RETURN IN SINGLE DATA FRAME
df = pd.read_sql(sql, conn, params=['2020-10-01', '2020-10-02'])

生成sql时似乎没有定义日期变量。这可能会有所帮助:

def sql_gen(day):
sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt =  '{}'
group by "Date";
""".format(day)
return sql
for day in date:
a = (pd.read_sql_query(sql_gen(day), link))
empty.append(a)

最新更新