我想在一组数据中应用一个独特的(不滚动(七天的平均时间,但是七天窗口直到"找到"示例并为不基于日历周。
我已经尝试了以下代码,但是此代码的问题是,它为数据集中的每个示例提供了一个滚动的平均示例。相反,我需要汇总所有在平均周期内出现在一个样本中的样本。
library(plyr)
library(dplyr)
library(lubridate)
Analyte<-c("Copper", "Copper", "Copper", "Copper", "Nickel", "Nickel", "Nickel")
Date<-mdy(c("1/1/2015", "1/3/2015", "1/12/2015", "1/15/2015", "1/3/2015", "1/6/2015", "1/8/2015"))
Matrix<-c("Water", "Water", "Water", "Water", "Water", "Water", "Water")
Fraction<-c("Total", "Total", "Total", "Total", "Dissolved", "Dissolved", "Dissolved")
Result<-c(0.6, 0.3, 0.5, 0.6, 0.1, 0.9, 1.0)
d<-cbind.data.frame(Analyte, Date, Matrix, Fraction, Result)
d$Date2<-d$Date
d$dateinterval<-interval(d$Date2-days(7), d$Date2+days(7))
d2<-ddply(d, c("Analyte", "Matrix", "Fraction"),function(df){
SevenDayResultMean<-rep(NA, length(df$Date))
SevenDayN<-rep(NA, length(df$Date))
for(i in 1:length(df$Date)){
SevenDayResultMean[i]<-mean(df$Result[df$Date2%within%df$dateinterval[i]], na.rm=T)
SevenDayN[i]<-length(df$Result[df$Date2%within%df$dateinterval[i]])
}
return(data.frame(SevenDayResultMean=SevenDayResultMean, Date=as.character(df$Date), SevenDayN=SevenDayN))
}
)
上面的代码返回下表,这是滚动平均值,而不是我需要的。在下表中,将第一个镍样品与以下两个镍样品进行平均。然后将第二个样品与第一个和最后一个样本进行平均,依此类推。
Analyte Matrix Fraction SevenDayResultMean Date SevenDayN
Copper Water Total 0.45 2015-01-01 2
Copper Water Total 0.3 2015-01-03 2
Copper Water Total 0.55 2015-01-12 2
Copper Water Total 0.6 2015-01-15 2
Nickel Water Dissolved 0.67 2015-01-03 3
Nickel Water Dissolved 0.95 2015-01-06 3
Nickel Water Dissolved 1.0 2015-01-08 3
理想情况下,我将定义一个平均周期,然后按类似值分组所有其他变量。我需要像以下几个桌子:
Analyte Date Matrix Fraction Result
Copper 1/1/2015 Water Total 0.45
Copper 1/12/2015 Water Total 0.55
Nickel 1/3/2015 Water Dissolved 0.67
在这里,将前两个样品平均,因为在第一个样品的七天内有相同的分数,矩阵和分析物,并成为结果表中的第一个入口。对于接下来的两个铜和所有镍样品的样品的平均值相同。在结果表中适用于样本的日期只要日期在平均七天内。
使用dplyr
,我们可以做:
library(dplyr)
d %>%
group_by(Analyte, Matrix, Fraction) %>%
mutate(interval = cumsum(Date - lag(Date, default = min(Date)) >= 7)) %>%
group_by(interval, add = TRUE) %>%
summarise(Date = min(Date), Result = mean(Result)) %>%
select(Analyte, Date, Matrix, Fraction, Result)
#> Source: local data frame [3 x 5]
#> Groups: Analyte, Matrix, Fraction [2]
#>
#> Analyte Date Matrix Fraction Result
#> <fctr> <date> <fctr> <fctr> <dbl>
#> 1 Copper 2015-01-01 Water Total 0.4500000
#> 2 Copper 2015-01-12 Water Total 0.5500000
#> 3 Nickel 2015-01-03 Water Dissolved 0.6666667
数据:
library(lubridate)
Analyte <- c("Copper", "Copper", "Copper", "Copper", "Nickel", "Nickel", "Nickel")
Date <- mdy(c("1/1/2015", "1/3/2015", "1/12/2015", "1/15/2015", "1/3/2015", "1/6/2015", "1/8/2015"))
Matrix <- c("Water", "Water", "Water", "Water", "Water", "Water", "Water")
Fraction <- c("Total", "Total", "Total", "Total", "Dissolved", "Dissolved", "Dissolved")
Result <- c(0.6, 0.3, 0.5, 0.6, 0.1, 0.9, 1.0)
d <- data.frame(Analyte, Date, Matrix, Fraction, Result)