我想使用CSV文件计算每个小时的平均值:
以下是我的数据集:
Timestamp Temperature
9/1/2016 0:00:08 53.8
9/1/2016 0:00:38 53.8
9/1/2016 0:01:08 53.8
9/1/2016 0:01:38 53.8
9/1/2016 0:02:08 53.8
9/1/2016 0:02:38 54.1
9/1/2016 0:03:08 54.1
9/1/2016 0:03:38 54.1
9/1/2016 0:04:38 54
9/1/2016 0:05:38 54
9/1/2016 0:06:08 54
9/1/2016 0:06:38 54
9/1/2016 0:07:08 54
9/1/2016 0:07:38 54
9/1/2016 0:08:08 54.1
9/1/2016 0:08:38 54.1
9/1/2016 0:09:38 54.1
9/1/2016 0:10:32 54
9/1/2016 0:11:02 54
9/1/2016 0:11:32 54
9/1/2016 0:00:08 54
9/2/2016 0:00:20 32
9/2/2016 0:00:50 32
9/2/2016 0:01:20 32
9/2/2016 0:01:50 32
9/2/2016 0:02:20 32
9/2/2016 0:02:50 32
9/2/2016 0:03:20 32
9/2/2016 0:03:50 32
9/2/2016 0:04:20 32
9/2/2016 0:04:50 32
9/2/2016 0:05:20 32
9/2/2016 0:05:50 32
9/2/2016 0:06:20 32
9/2/2016 0:06:50 32
9/2/2016 0:07:20 32
9/2/2016 0:07:50 32
这是我计算每天平均值的代码,但我想要每小时:
from datetime import datetime
import pandas
def same_day(date_string): # Remove year
return datetime.strptime(date_string, "%m/%d/%Y %H:%M%S").strftime(%m%d')
df = pandas.read_csv('/home/kk/Desktop/cal_Avg.csv',index_col=0,usecols=[0, 1], names=['Timestamp', 'Discharge'],converters={'Timestamp': same_day})
print(df.groupby(level=0).mean())
我想要的输出是:
Timestamp Temp * Avg
9/1/2016 0:00:08 53.8
9/1/2016 0:00:38 53.8 ?avg for this hour
9/1/2016 0:01:08 53.8
9/1/2016 0:01:38 53.8 ?avg for this hour
9/1/2016 0:02:08 53.8
9/1/2016 0:02:38 54.1
现在我想要特定小时的平均值,最小
期望输出:
在这里,我只打印日期为2016年9月1日和2016年2月9日的5小时输出
010900 54.362727 45.497273
010901 54.723276 45.068103
010902 54.746847 45.370270
010903 54.833913 44.931304
010904 54.971053 44.835088
010905 55.519444 44.459259
020901 31.742553 55.640426
020902 31.495556 55.655556
020903 31.304348 55.442609
020904 31.200000 55.437273
020905 31.294382 55.442697
具体日期和具体时间?如何存档?
我认为您需要第一个read_csv
,参数index_col=[0]
用于将第一列读取到index
,parse_dates=[0]
用于将第一行解析到DatetimeIndex
:
df = pd.read_csv('filename', index_col=[0], parse_dates=[0],, usecols=[0,1])
print (df)
Temperature
Timestamp
2016-09-01 00:00:08 53.8
2016-09-01 00:00:38 53.8
2016-09-01 00:01:08 53.8
2016-09-01 00:01:38 53.8
2016-09-01 00:02:08 53.8
2016-09-01 00:02:38 54.1
2016-09-01 00:03:08 54.1
...
...
然后由hours
使用resample
,并聚合Resampler.mean
,但对于DatetimeIndex
:中丢失的数据,可以获得NaN
print (df.resample('H').mean())
Temperature
Timestamp
2016-09-01 00:00:00 53.980952
2016-09-01 01:00:00 NaN
2016-09-01 02:00:00 NaN
2016-09-01 03:00:00 NaN
2016-09-01 04:00:00 NaN
2016-09-01 05:00:00 NaN
2016-09-01 06:00:00 NaN
2016-09-01 07:00:00 NaN
2016-09-01 08:00:00 NaN
2016-09-01 09:00:00 NaN
2016-09-01 10:00:00 NaN
2016-09-01 11:00:00 NaN
2016-09-01 12:00:00 NaN
2016-09-01 13:00:00 NaN
2016-09-01 14:00:00 NaN
2016-09-01 15:00:00 NaN
2016-09-01 16:00:00 NaN
2016-09-01 17:00:00 NaN
2016-09-01 18:00:00 NaN
2016-09-01 19:00:00 NaN
2016-09-01 20:00:00 NaN
2016-09-01 21:00:00 NaN
2016-09-01 22:00:00 NaN
2016-09-01 23:00:00 NaN
2016-09-02 00:00:00 32.000000
另一种解决方案是通过该array
:铸造成hours
和groupby
来去除minutes
和seconds
print (df.index.values.astype('<M8[h]'))
['2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
'2016-09-01T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
'2016-09-02T00']
print (df.groupby([df.index.values.astype('<M8[h]')]).mean())
Temperature
2016-09-01 53.980952
2016-09-02 32.000000
如果需要,可以通过DatetimeIndex.strftime
:获得groupby
print (df.index.strftime('%m%d%H'))
['090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
'090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
'090100' '090100' '090100' '090100' '090100' '090200' '090200' '090200'
'090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200'
'090200' '090200' '090200' '090200' '090200']
print (df.groupby([df.index.strftime('%m%d%H')]).mean())
Temperature
090100 53.980952
090200 32.000000
或者,如果需要,仅按小时groupby
除以DatetimeIndex.hour
:
print (df.index.hour)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
print (df.groupby([df.index.hour]).mean())
Temperature
0 44.475676
为了可读性,我首先定义一个新列hour
,然后定义groupBy
df = pd.DataFrame.from_csv('/home/kk/Desktop/cal_Avg.csv',index_col=None)
df['hour']=df['Timestamp'].apply(lambda s:s[:-3])
df[['hour','Temprature']].groupBy('hour').mean()