如何使脚本生效以纠正记录器在 R 中的季节性漂移?



我在水中安装了一堆二氧化碳记录仪,在开放水域季节每小时记录一次二氧化碳。在安装之前和之后,我已经在3种不同浓度的二氧化碳下对记录器进行了表征。

  • 我假设季节误差漂移将是线性的
  • 我假设我的特征点之间的误差将是线性的

我的脚本基于一个for循环,该循环遍历每个时间戳并更正值,这很有效,但速度不够快。我知道这可以在一秒钟内完成,但我不确定如何做到。我寻求一些建议,如果有人能告诉我怎么做,我将不胜感激。

基于基本R:的可复制示例

start <- as.POSIXct("2022-08-01 00:00:00")#time when logger is installed
stop <- as.POSIXct("2022-09-01 00:00:00")#time when retrieved
dt <- seq.POSIXt(start,stop,by=3600)#generate datetime column, measured hourly
#generate a bunch of values within my measured range
co2 <- round(rnorm(length(dt),mean=600,sd=100))
#generate dummy dataframe
dummy <- data.frame(dt,co2)
#actual values used in characterization
actual <- c(0,400,1000)
#measured in the container by the instruments being characterized
measured.pre <- c(105,520,1150)
measured.post <- c(115,585,1250)
diff.pre <- measured.pre-actual#diff at precharacterization
diff.post <- measured.post-actual#diff at post
#linear interpolation of how deviance from actual values change throughout the season
#I assume that the temporal drift is linear 
diff.0 <- seq(diff.pre[1],diff.post[1],length.out=length(dummy$dt))
diff.400 <- seq(diff.pre[2],diff.post[2],length.out = length(dummy$dt))
diff.1000 <-  seq(diff.pre[3],diff.post[3],length.out = length(dummy$dt))
#creates a data frame with the assumed drift at each increment throughout the season
dummy <- data.frame(dummy,diff.0,diff.400,diff.1000)
#this loop makes a 3-point calibration at each day in the dummy data set
co2.corrected <- vector()
for(i in 1:nrow(dummy)){
print(paste0("row: ",i))#to show the progress of the loop
diff.0 <- dummy$diff.0[i]#get the differences at characterization increments
diff.400 <- dummy$diff.400[i]
diff.1000 <- dummy$diff.1000[i]
#values below are only used for encompassing the range of measured values in the characterization
#this is based on the interpolated difference at the given time point and the known concentrations used 
measured.0 <- diff.0+0
measured.400 <- diff.400+400
measured.1000 <- diff.1000+1000

#linear difference between calibration at 0 and 400
seg1 <- seq(diff.0,diff.400,length.out=measured.400-measured.0)
#linear difference between calibration at 400 and 1000
seg2 <- seq(diff.400,diff.1000,length.out=measured.1000-measured.400)
#bind them together to get one vector
correction.ppm <- c(seg1,seg2)


#the complete range of measured co2 in the characterization.
#in reality it can not be below 0 and thus it can not be below the minimum measured in the range
measured.co2.range <- round(seq(measured.0,measured.1000,length.out=length(correction.ppm)))
#generate a table from which we can characterize the measured values from
correction.table <- data.frame(measured.co2.range,correction.ppm)

co2 <- dummy$co2[i] #measured co2 at the current row
#find the measured value in the table and extract the difference
diff <- correction.table$correction.ppm[match(co2,correction.table$measured.co2.range)]
#correct the value and save it to vector
co2.corrected[i] <- co2-diff

}
#generate column with calibrated values
dummy$co2.corrected <- co2.corrected

这是我在查看代码后所理解的。您有一系列CO2浓度读数,但需要根据时间序列开始和结束时的特征测量值对其进行校正。这两组表征测量都是使用三种已知浓度进行的:0、400和1000。

您的代码似乎试图应用双线性插值(随着时间和浓度的变化(来应用所需的校正。这很容易矢量化:

set.seed(1)
start <- as.POSIXct("2022-08-01 00:00:00")#time when logger is installed
stop <- as.POSIXct("2022-09-01 00:00:00")#time when retrieved
dt <- seq.POSIXt(start,stop,by=3600)#generate datetime column, measured hourly
#generate a bunch of values within my measured range
co2 <- round(rnorm(length(dt),mean=600,sd=100))
#actual values used in characterization
actual <- c(0,400,1000)
#measured in the container by the instruments being characterized
measured.pre <- c(105,520,1150)
measured.post <- c(115,585,1250)
# interpolate the reference concentrations over time
cref <- mapply(seq, measured.pre, measured.post, length.out = length(dt))
#generate dummy dataframe with corrected values
dummy <- data.frame(
dt,
co2,
co2.corrected = ifelse(
co2 < cref[,2],
actual[1] + (co2 - cref[,1])*(actual[2] - actual[1])/(cref[,2] - cref[,1]),
actual[2] + (co2 - cref[,2])*(actual[3] - actual[2])/(cref[,3] - cref[,2])
)
)
head(dummy)
#>                    dt co2 co2.corrected
#> 1 2022-08-01 00:00:00 537      416.1905
#> 2 2022-08-01 01:00:00 618      493.2432
#> 3 2022-08-01 02:00:00 516      395.9776
#> 4 2022-08-01 03:00:00 760      628.2707
#> 5 2022-08-01 04:00:00 633      507.2542
#> 6 2022-08-01 05:00:00 518      397.6533

我不知道你在计算什么(我觉得可以用不同的方法(,但你可以通过以下方式提高速度:

  • 删除print,这在循环中需要大量时间
  • 在每次迭代中删除data.frame创建,这很慢,这里不需要

这个循环应该更快:

for(i in 1:nrow(dummy)){
diff.0 <- dummy$diff.0[i]
diff.400 <- dummy$diff.400[i]
diff.1000 <- dummy$diff.1000[i]

measured.0 <- diff.0+0
measured.400 <- diff.400+400
measured.1000 <- diff.1000+1000

seg1 <- seq(diff.0,diff.400,length.out=measured.400-measured.0)
seg2 <- seq(diff.400,diff.1000,length.out=measured.1000-measured.400)
correction.ppm <- c(seg1,seg2)

s <- seq(measured.0,measured.1000,length.out=length(correction.ppm))
measured.co2.range <- round(s)

co2 <- dummy$co2[i]
diff <- correction.ppm[match(co2, measured.co2.range)]
co2.corrected[i] <- co2-diff
}

p.s.现在我测试中最慢的部分是round(s)。也许可以删除或重写。。。

最新更新