r-对某列的数据表行应用savitzky golay过滤的有效方法



我写了一个函数,对data.table中的每一行应用savitzky golay过滤器。第一列具有测量值作为参数,后面的所有列也包含要过滤的测量值。已处理的行将就地更新。

我的功能正常,但速度慢。

如何更改函数以提高工作效率和数据量?

MWE:

library(data.table)
library(pracma)
library(datasets)
data(iris)
setDT(iris)
#Reorder columns because function expects columns to apply a filter on 
#starting from a defined column to the last column
setcolorder(iris, "Species")

savitzky_golay <- function(dt, id_of_first_sample_col=2, win_size=5) {

c_names_samples <- colnames(dt)[id_of_first_sample_col:ncol(dt)]

for (i in seq(from=1, to=nrow(dt))) {
mat <- as.numeric(dt[i,id_of_first_sample_col:ncol(dt)]) # Get sample data as matrix (one row)
mat <- savgol(mat,fl=win_size,forder=2,dorder=0) # Savitzky-Golay-Filter

dt[i, (c_names_samples) := as.list(mat)] # Update columns of current row by reference
}
# Returns nothing as update is done via reference.
}
savitzky_golay(iris)

尝试:

savitzky_golay_new <- function(dt, id_of_first_sample_col=2, win_size=5) {
c_names_samples <- colnames(dt)[id_of_first_sample_col:ncol(dt)]
dt[,(c_names_samples):=asplit(apply(.SD,1,function(x) savgol(x,fl=win_size,forder=2,dorder=0)),1)
,.SDcols=c_names_samples]
}

性能比较:

microbenchmark::microbenchmark(savitzky_golay_new(dt2),savitzky_golay(dt1))
Unit: milliseconds
expr     min       lq     mean   median       uq      max neval
savitzky_golay_new(dt2) 12.7808 13.69695 15.63821 14.31785 15.17705  31.2701   100
savitzky_golay(dt1) 71.4231 81.96115 87.97737 86.41265 90.42620 239.7945   100

最新更新