我正试图对日期范围内的所有日期反复运行查询,并将结果收集到每次迭代的Pandas DF中。
我建立了一个连接(PYODBC(,并创建了一个日期列表,我希望通过SQL查询运行该列表,以聚合到DF中。我确认了日期是一份清单。
link = pyodbc.connect( Connection Details )
date = [d.strftime('%Y-%m-%d') for d in pd.date_range('2020-10-01','2020-10-02')]
type(date)
我创建了一个空DF来收集SQL查询每次迭代的结果,并检查了结构。
empty = pd.DataFrame(columns = ['Date', 'Balance'])
empty
我将查询设置为:
sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt = '{}'
group by "Date";
""".format(day)
我尝试了下面的for循环,希望将每个查询执行的结果聚合到空的df中,但我得到了一个空的df。
for day in date:
a = (pd.read_sql_query(sql, link))
empty.append(a)
如果问题与SQL设置和/或for循环有关,有什么想法吗?一个更好更有效的方法来解决这个问题?
通过将Date
添加为GROUP BY
列并将开始日期和结束日期作为筛选参数来避免循环并运行单个SQL查询。并使用首选的参数化方法,而不是pandas.read_sql
支持的字符串格式:
# PREPARED STATEMENT WITH ? PLACEHOLDERS
sql = """SALES dt AS "Date"
, SUM(BAL)/1000 AS "Balance"
FROM sales
WHERE item IN (1,2,3,4)
AND dt BETWEEN ? AND ?
GROUP BY dt;
"""
# BIND PARAMS TO QUERY RETURN IN SINGLE DATA FRAME
df = pd.read_sql(sql, conn, params=['2020-10-01', '2020-10-02'])
生成sql时似乎没有定义日期变量。这可能会有所帮助:
def sql_gen(day):
sql = """
Select dt as "Date", sum(BAL)/1000 as "Balance"
From sales as bal
where bal.item IN (1,2,3,4)
AND bal.dt = '{}'
group by "Date";
""".format(day)
return sql
for day in date:
a = (pd.read_sql_query(sql_gen(day), link))
empty.append(a)