连接多个数据帧并管理不存在的数据帧



我尝试将一些数据帧(24小时数据的30数据帧(与一些csv自动创建,但有时csv不存在,因此没有创建数据帧(df1、fd2、df4、df8、df9…(。

因此,我想用7个串联的df创建每周数据帧,但当缺少一些df时,函数会出错。但是,如果缺少一些,我希望创建一个空周期的每周df。

dfweek1 = pd.concat([df1, df2, df3, df4, df5, df6, df7])
dfweek2 = pd.concat([df8, df9, df10, df11, df12, df13, df14])
dfweek3 = pd.concat([df15, df16, df17, df18, df19, df20, df21])
dfweek4 = pd.concat([df22, df23, df24, df25, df26, df27, df28])

当我逐行执行测试时:

dfweek2 = pd.concat([df9, df10, df11, df12, df13, df14, df15])
Traceback (most recent call last):
File "/tmp/ipykernel_40958/2223069184.py", line 1, in <cell line: 1>
dfweek2 = pd.concat([df9, df10, df11, df12, df13, df14, df15])
NameError: name 'df14' is not defined

缺少df14,因此没有创建dfweek2。

######################

编辑:这是我所有的代码,更全面,也许(可能(我没有选择最好的方式。。。忽略所有插入的东西,以后(也许(。

######################

from datetime import date, timedelta, datetime
import pandas as pd
import argparse
import locale
locale.setlocale(locale.LC_TIME,'')
def get_arguments():
parser = argparse.ArgumentParser()
parser.add_argument("--date", help="Start Date in YYYY-MM-DD format", required=False, default=None)

return parser.parse_args()
args = get_arguments()
random_date = args.date
#print(args.date)

startdate=pd.to_datetime(args.date,format="%Y-%m-%d").strftime("%Y-%m-%d")
enddate  = pd.to_datetime(startdate) + pd.DateOffset(months=1)
enddate = pd.to_datetime(enddate,format="%Y-%m-%d").strftime("%Y-%m-%d")
filedate = startdate
file=('consommation_horaire_'+str(filedate)+'.csv')
x = 1
df = {}
#ts = {}
#upsampled = {}
#interpol = {}
#recupération d'un mois de fichiers de 24h
while filedate != enddate :
try :
file=('consommation_horaire_'+str(filedate)+'.csv')
#globals()[f"df{x}"] = pd.read_csv(file, sep=",", on_bad_lines='skip', header=0, parse_dates=["date"], names =["date","kwh"])
globals()[f"df{x}"] = pd.read_csv(file, sep=",", on_bad_lines='skip', header=0, names =["date","kwh"])
globals()[f"df{x}"]['date'] = pd.to_datetime(globals()[f"df{x}"]['date'], format='%d/%m/%Y %H:%M')
globals()[f"df{x}"] = globals()[f"df{x}"].set_index('date')
#globals()[f"ts{x}"] = globals()[f"df{x}"]['kwh']
#globals()[f"upsampled{x}"] = globals()[f"ts{x}"].resample('60s')
#globals()[f"interpol{x}"] = globals()[f"upsampled{x}"].interpolate(method='quadratic', order=2)
#print(globals()[f"ts{x}"].head(1))
x += 1
filedate = pd.to_datetime(filedate,format="%Y-%m-%d") + timedelta(days=1)
filedate = pd.to_datetime(filedate,format="%Y-%m-%d").strftime("%Y-%m-%d")
except FileNotFoundError :
print('fichier ' +file+ ' manquant')
x += 1
filedate = pd.to_datetime(filedate,format="%Y-%m-%d") + timedelta(days=1)
filedate = pd.to_datetime(filedate,format="%Y-%m-%d").strftime("%Y-%m-%d")
pass
dfweek1 = pd.concat([df1, df2, df3, df4, df5, df6, df7])
dfweek2 = pd.concat([df8, df9, df10, df11, df12, df13, df14])
dfweek3 = pd.concat([df15, df16, df17, df18, df19, df20, df21])
dfweek4 = pd.concat([df22, df23, df24, df25, df26, df27, df28])

我发现了一些正在工作的东西,即使我认为这不是最好的方法,这是我的代码,有点长:

合成:语法globals(([f;df{x};]对于在循环中自动创建新的数据帧非常有用;df";是数据帧的名称;x〃;是它的编号(见下文(,但您必须首先使用df={}创建一个字典。

注意:与我的第一篇文章不同,在这段代码中,我在开发过程中用常量值"date"替换了"arg"def部分。

#########
date='2021-04-01'
startdate=pd.to_datetime(date,format="%Y-%m-%d").strftime("%Y-%m-%d")
enddate  = pd.to_datetime(startdate) + pd.DateOffset(months=1)
enddate = pd.to_datetime(enddate,format="%Y-%m-%d").strftime("%Y-%m-%d")
#creating dictionaries to generate alldataframe
filedate = startdate
x = 1
df = {}

while filedate != enddate :
try :
file=('consommation_horaire_'+str(filedate)+'.csv')
globals()[f"df{x}"] = pd.read_csv(file, sep=",", on_bad_lines='skip', header=0, names =["date","kwh"])
globals()[f"df{x}"]['date'] = pd.to_datetime(globals()[f"df{x}"]['date'], format='%d/%m/%Y %H:%M')
globals()[f"df{x}"] = globals()[f"df{x}"].set_index('date')
filedate = pd.to_datetime(filedate,format="%Y-%m-%d") + timedelta(days=1)
filedate = pd.to_datetime(filedate,format="%Y-%m-%d").strftime("%Y-%m-%d")
x += 1
except FileNotFoundError :
print('fichier ' +file+ ' manquant')
x += 1
filedate = pd.to_datetime(filedate,format="%Y-%m-%d") + timedelta(days=1)
filedate = pd.to_datetime(filedate,format="%Y-%m-%d").strftime("%Y-%m-%d")
pass

#creating an empty dataframe or completing it if doesn't exist or incomplete
df = {}
days=pd.to_datetime(date,format="%Y-%m-%d").days_in_month
for x in range(1,days):
try:
if globals()[f"df{x}"].shape[0] < 48 :
#filling empty lines
testdate=globals()[f"df{x}"].index[0].strftime("%Y-%m-%d")
testdate1 = pd.to_datetime(testdate,format="%Y-%m-%d %H:%M:%S")
testdate2 = pd.to_datetime(testdate,format="%Y-%m-%d %H:%M:%S")+ timedelta(hours=23.5)
fillindex=pd.date_range(start=testdate1, end=testdate2, freq='30min')
globals()[f"df{x}"]=globals()[f"df{x}"].reindex(fillindex)


else:
pass


except :
globals()[f"df{x}"] = pd.DataFrame(index=range(48),columns=range(1))
globals()[f"df{x}"].columns = ['kwh']
testdate=pd.to_datetime(date,format="%Y-%m-%d %H:%M:%S")+ timedelta(days=x-1)
testdate1 = pd.to_datetime(testdate,format="%Y-%m-%d %H:%M:%S")
testdate2 = pd.to_datetime(testdate,format="%Y-%m-%d %H:%M:%S")+ timedelta(hours=23.5)
fillindex=pd.date_range(start=testdate1, end=testdate2, freq='30min')
globals()[f"df{x}"].index.names = ['date']
globals()[f"df{x}"].index = pd.DatetimeIndex(fillindex)

pass

最新更新