如何使用groupby时间创建单独的数据框架



我有这个数据集,数据是在34天内以15分钟的间隔收集的。

我如何从一天的同一时间获取所有数据?我已经加载并将数据集转换为DateTime格式。

我已经得到了下面一段代码的工作:

tmp=weather_sensor_df()
df=pd.DataFrame(columns=tmp.columns)
print(df)
tmp.DATE_TIME.dt.hour[13]
for i in tmp.index:
time = tmp.DATE_TIME[i]
if time.hour==13 and time.minute==0:
dict={
df.columns[0]:time,
df.columns[1]:tmp.AMBIENT_TEMPERATURE[i],
df.columns[2]:tmp.MODULE_TEMPERATURE[i],
df.columns[3]:tmp.IRRADIATION[i],
}
df=df.append(dict,ignore_index=True)

参考:weather_sensor_df()加载气象传感器数据帧,并使用pd.DataFrame.to_datetime()DATE_TIME设置为Timestamp格式。

我认为groupby()函数更适合这种情况,但我不确定如何进行。

DATE_TIME,PLANT_ID,SOURCE_KEY,AMBIENT_TEMPERATURE,MODULE_TEMPERATURE,IRRADIATION
2020-05-15 00:00:00,4135001,HmiyD2TTLFNqkNe,25.184316133333333,22.8575074,0.0
2020-05-15 00:15:00,4135001,HmiyD2TTLFNqkNe,25.08458866666667,22.761667866666663,0.0
2020-05-15 00:30:00,4135001,HmiyD2TTLFNqkNe,24.935752600000004,22.59230553333333,0.0
2020-05-15 00:45:00,4135001,HmiyD2TTLFNqkNe,24.8461304,22.36085213333333,0.0
2020-05-15 01:00:00,4135001,HmiyD2TTLFNqkNe,24.621525357142858,22.165422642857145,0.0
2020-05-15 01:15:00,4135001,HmiyD2TTLFNqkNe,24.5360922,21.968570866666667,0.0
2020-05-15 01:30:00,4135001,HmiyD2TTLFNqkNe,24.638673866666664,22.352925666666668,0.0
2020-05-15 01:45:00,4135001,HmiyD2TTLFNqkNe,24.87302233333333,23.1609192,0.0
2020-05-15 02:00:00,4135001,HmiyD2TTLFNqkNe,24.936930466666663,23.026113,0.0
2020-05-15 02:15:00,4135001,HmiyD2TTLFNqkNe,25.0122476,23.343229266666665,0.0
2020-06-17 21:30:00,4135001,HmiyD2TTLFNqkNe,22.9965616,21.869773466666665,0.0
2020-06-17 21:45:00,4135001,HmiyD2TTLFNqkNe,23.137091,22.1259848,0.0
2020-06-17 22:00:00,4135001,HmiyD2TTLFNqkNe,22.563179466666668,21.164713466666665,0.0
2020-06-17 22:15:00,4135001,HmiyD2TTLFNqkNe,22.19922893333333,20.51527293333333,0.0
2020-06-17 22:30:00,4135001,HmiyD2TTLFNqkNe,22.171736666666664,21.0808288,0.0
2020-06-17 22:45:00,4135001,HmiyD2TTLFNqkNe,22.150569666666662,21.480377266666668,0.0
2020-06-17 23:00:00,4135001,HmiyD2TTLFNqkNe,22.129815666666666,21.38902386666667,0.0
2020-06-17 23:15:00,4135001,HmiyD2TTLFNqkNe,22.008274642857145,20.709211357142856,0.0
2020-06-17 23:30:00,4135001,HmiyD2TTLFNqkNe,21.96949473333333,20.7349628,0.0
2020-06-17 23:45:00,4135001,HmiyD2TTLFNqkNe,21.909287666666668,20.4279724,0.0
  • pandas.DataFrame.groupby代替.dt.time
    • .dt.hour可用于按小时分组
  • 没有为列指定聚合函数,因此dfgDataFrameGroupBy对象。
  • 使用GroupBy对象,可以创建dict的数据帧,isoformat(例如'hh:mm:ss')中的时间作为键。
    • 如果.dt.hour用于组,则删除.isoformat,keys将成为ints(0...23)。
import pandas as pd
# load the data
tmp = pd.read_csv('./data/Plant_1_Weather_Sensor_Data.csv')
# set the column as a datetime dtype
tmp.DATE_TIME = pd.to_datetime(tmp.DATE_TIME)
# groupby time
dfg = tmp.groupby(tmp.DATE_TIME.dt.time)
# create a dict of dataframes, where the key is an isoformat datetime.time
df_times = {g.isoformat(): data for g, data in dfg}
# display(df_times['00:15:00'].head())
DATE_TIME  PLANT_ID       SOURCE_KEY  AMBIENT_TEMPERATURE  MODULE_TEMPERATURE  IRRADIATION
1   2020-05-15 00:15:00   4135001  HmiyD2TTLFNqkNe            25.084589           22.761668          0.0
182 2020-05-17 00:15:00   4135001  HmiyD2TTLFNqkNe            24.011531           21.648279          0.0
278 2020-05-18 00:15:00   4135001  HmiyD2TTLFNqkNe            21.041437           20.475962          0.0
374 2020-05-19 00:15:00   4135001  HmiyD2TTLFNqkNe            22.548998           20.529877          0.0
467 2020-05-20 00:15:00   4135001  HmiyD2TTLFNqkNe            22.255206           20.110174          0.0
# iterate through the dict of dataframes like a normal dict
for k, v in df_times.items():
print(k)
print(v.head())    

最新更新