我有这个数据集,数据是在34天内以15分钟的间隔收集的。
我如何从一天的同一时间获取所有数据?我已经加载并将数据集转换为DateTime格式。
我已经得到了下面一段代码的工作:
tmp=weather_sensor_df()
df=pd.DataFrame(columns=tmp.columns)
print(df)
tmp.DATE_TIME.dt.hour[13]
for i in tmp.index:
time = tmp.DATE_TIME[i]
if time.hour==13 and time.minute==0:
dict={
df.columns[0]:time,
df.columns[1]:tmp.AMBIENT_TEMPERATURE[i],
df.columns[2]:tmp.MODULE_TEMPERATURE[i],
df.columns[3]:tmp.IRRADIATION[i],
}
df=df.append(dict,ignore_index=True)
参考:weather_sensor_df()
加载气象传感器数据帧,并使用pd.DataFrame.to_datetime()
将DATE_TIME
设置为Timestamp
格式。
我认为groupby()
函数更适合这种情况,但我不确定如何进行。
DATE_TIME,PLANT_ID,SOURCE_KEY,AMBIENT_TEMPERATURE,MODULE_TEMPERATURE,IRRADIATION
2020-05-15 00:00:00,4135001,HmiyD2TTLFNqkNe,25.184316133333333,22.8575074,0.0
2020-05-15 00:15:00,4135001,HmiyD2TTLFNqkNe,25.08458866666667,22.761667866666663,0.0
2020-05-15 00:30:00,4135001,HmiyD2TTLFNqkNe,24.935752600000004,22.59230553333333,0.0
2020-05-15 00:45:00,4135001,HmiyD2TTLFNqkNe,24.8461304,22.36085213333333,0.0
2020-05-15 01:00:00,4135001,HmiyD2TTLFNqkNe,24.621525357142858,22.165422642857145,0.0
2020-05-15 01:15:00,4135001,HmiyD2TTLFNqkNe,24.5360922,21.968570866666667,0.0
2020-05-15 01:30:00,4135001,HmiyD2TTLFNqkNe,24.638673866666664,22.352925666666668,0.0
2020-05-15 01:45:00,4135001,HmiyD2TTLFNqkNe,24.87302233333333,23.1609192,0.0
2020-05-15 02:00:00,4135001,HmiyD2TTLFNqkNe,24.936930466666663,23.026113,0.0
2020-05-15 02:15:00,4135001,HmiyD2TTLFNqkNe,25.0122476,23.343229266666665,0.0
2020-06-17 21:30:00,4135001,HmiyD2TTLFNqkNe,22.9965616,21.869773466666665,0.0
2020-06-17 21:45:00,4135001,HmiyD2TTLFNqkNe,23.137091,22.1259848,0.0
2020-06-17 22:00:00,4135001,HmiyD2TTLFNqkNe,22.563179466666668,21.164713466666665,0.0
2020-06-17 22:15:00,4135001,HmiyD2TTLFNqkNe,22.19922893333333,20.51527293333333,0.0
2020-06-17 22:30:00,4135001,HmiyD2TTLFNqkNe,22.171736666666664,21.0808288,0.0
2020-06-17 22:45:00,4135001,HmiyD2TTLFNqkNe,22.150569666666662,21.480377266666668,0.0
2020-06-17 23:00:00,4135001,HmiyD2TTLFNqkNe,22.129815666666666,21.38902386666667,0.0
2020-06-17 23:15:00,4135001,HmiyD2TTLFNqkNe,22.008274642857145,20.709211357142856,0.0
2020-06-17 23:30:00,4135001,HmiyD2TTLFNqkNe,21.96949473333333,20.7349628,0.0
2020-06-17 23:45:00,4135001,HmiyD2TTLFNqkNe,21.909287666666668,20.4279724,0.0
- 用
pandas.DataFrame.groupby
代替.dt.time
。.dt.hour
可用于按小时分组
- 没有为列指定聚合函数,因此
dfg
是DataFrameGroupBy
对象。 - 使用
GroupBy
对象,可以创建dict
的数据帧,isoformat
(例如'hh:mm:ss'
)中的时间作为键。- 如果
.dt.hour
用于组,则删除.isoformat
,keys
将成为ints
(0...23
)。
- 如果
import pandas as pd
# load the data
tmp = pd.read_csv('./data/Plant_1_Weather_Sensor_Data.csv')
# set the column as a datetime dtype
tmp.DATE_TIME = pd.to_datetime(tmp.DATE_TIME)
# groupby time
dfg = tmp.groupby(tmp.DATE_TIME.dt.time)
# create a dict of dataframes, where the key is an isoformat datetime.time
df_times = {g.isoformat(): data for g, data in dfg}
# display(df_times['00:15:00'].head())
DATE_TIME PLANT_ID SOURCE_KEY AMBIENT_TEMPERATURE MODULE_TEMPERATURE IRRADIATION
1 2020-05-15 00:15:00 4135001 HmiyD2TTLFNqkNe 25.084589 22.761668 0.0
182 2020-05-17 00:15:00 4135001 HmiyD2TTLFNqkNe 24.011531 21.648279 0.0
278 2020-05-18 00:15:00 4135001 HmiyD2TTLFNqkNe 21.041437 20.475962 0.0
374 2020-05-19 00:15:00 4135001 HmiyD2TTLFNqkNe 22.548998 20.529877 0.0
467 2020-05-20 00:15:00 4135001 HmiyD2TTLFNqkNe 22.255206 20.110174 0.0
# iterate through the dict of dataframes like a normal dict
for k, v in df_times.items():
print(k)
print(v.head())