我有一些按小时测量的空气污染数据。
日期时间 | PM2.5 | 站点id|
---|---|---|
2020-01-01 00:00:00 | 10 | 1 |
2020-01-01 01:00:00 | NA | 1 |
2020-01-01 02:00:00 | 15 | 1 |
2020-01-01 03:00:00 | NA | 1 |
2020-01-01 04:00:00 | 7 | 1 |
2020-01-01 05:00:00 | 20 | 1 |
2020-01-01 06:00:00 | 30 | 1 |
2020-01-01 00:00:00 | NA | 2 |
2020-01-01 01:00:00 | 17 | 2 |
2020-01-01 02:00:00 | 21 | 2 |
2020-01-01 03:00:00 | 55 | 2 |
;最有效的"方式几乎肯定会使用data.table
。类似这样的东西:
library(data.table)
setDT(your_data)
your_data[, date := as.IDate(Datetime)][,
if(
!(sum(is.na(PM2.5)) >= 18 &
with(rle(is.na(PM2.5)), max(lengths[values])) >= 8
)) .SD,
by = .(date, station.id)
]
# date Datetime PM2.5
# 1: 2020-01-01 2020-01-01 00:00:00 10
# 2: 2020-01-01 2020-01-01 01:00:00 NA
# 3: 2020-01-01 2020-01-01 02:00:00 15
# 4: 2020-01-01 2020-01-01 03:00:00 NA
# 5: 2020-01-01 2020-01-01 04:00:00 7
# 6: 2020-01-01 2020-01-01 05:00:00 20
# 7: 2020-01-01 2020-01-01 06:00:00 30
使用此示例数据:
your_data = fread(text = 'Datetime PM2.5
2020-01-01 00:00:00 10
2020-01-01 01:00:00 NA
2020-01-01 02:00:00 15
2020-01-01 03:00:00 NA
2020-01-01 04:00:00 7
2020-01-01 05:00:00 20
2020-01-01 06:00:00 30')