如何在r中包含NaN的两个数据帧上做平均绝对误差(mae)



我的数据如下:

> dput(head(df1,25))
structure(list(Date = structure(c(16644, 16645, 16646, 16647, 
16648, 16649, 16650, 16651, 16652, 16653, 16654, 16655, 16656, 
16657, 16658, 16659, 16660, 16661, 16662, 16663, 16664, 16665, 
16666, 16667, 16668), class = "Date"), AU = c(0.241392906920806, 
0.257591745069017, 0.263305712230276, NaN, 0.252892547032525, 
0.251771180928526, 0.249211746794207, 0.257289083109259, 0.205017582640463, 
0.20072274573488, 0.210154167590338, 0.207384553271337, 0.193725450540089, 
0.199282601988984, 0.216267134143314, 0.217052471451736, NaN, 
0.220703029531909, 0.2164619798534, 0.223442036108148, 0.22061326758891, 
NaN, 0.277777461504811, NaN, 0.200839628485262)), row.names = c(NA, 
-25L), class = c("tbl_df", "tbl", "data.frame"))
> dput(head(df2,25))
structure(list(UF1 = c(0.2559, 0.2565, 0.257, 0.2577, 0.2583, 
0.259, 0.2596, 0.2603, 0.2611, 0.2618, 0.2625, 0.2633, 0.2641, 
0.2649, 0.2657, 0.2665, 0.2674, 0.2682, 0.2691, 0.27, 0.2709, 
0.2718, 0.2727, 0.2736, 0.2745), UF2 = c(0.2597, 0.2602, 0.2608, 
0.2614, 0.2621, 0.2627, 0.2634, 0.2641, 0.2648, 0.2655, 0.2663, 
0.267, 0.2678, 0.2686, 0.2694, 0.2702, 0.2711, 0.2719, 0.2728, 
0.2737, 0.2745, 0.2754, 0.2763, 0.2773, 0.2782), UF3 = c(0.2912, 
0.2915, 0.2918, 0.2922, 0.2926, 0.293, 0.2934, 0.2938, 0.2943, 
0.2947, 0.2952, 0.2957, 0.2962, 0.2968, 0.2973, 0.2979, 0.2985, 
0.2991, 0.2997, 0.3003, 0.3009, 0.3016, 0.3022, 0.3029, 0.3035
), Date = structure(c(16644, 16645, 16646, 16647, 16648, 16649, 
16650, 16651, 16652, 16653, 16654, 16655, 16656, 16657, 16658, 
16659, 16660, 16661, 16662, 16663, 16664, 16665, 16666, 16667, 
16668), class = "Date")), row.names = c(NA, 25L), class = "data.frame")

我正在尝试在观察到的之间做平均绝对误差(mae)df1$AU预测用下面的代码(如何不使用库(Metrics)制作MAE和RAE函数?)计算df2$UF1,df$UF2df$UF3的值:

mae1 <- function(df1$AU,df2$UF1, na.rm=TRUE)
{
mean(abs(df1$AU-df2$UF1), na.rm=na.rm)
}
mae1(df1$AU,df2$UF1, na.rm=TRUE)

但是我总是得到这个错误:

Error in mean.default(abs(df1$AU - df2$UF1),  : 
object 'na.rm' not found

我也试着用library(Metrics)

mae(df1$AU, df2$UF1, na.rm=TRUE)

但是总是得到这个错误

Error in mae(df1$AU, df2$UF1,  : 
unused argument (na.rm = TRUE)

我试图做的只有:

mean(abs(df1$AU-df2$UF1), na.rm=TRUE)

,我得到了一个值,但我不知道它是否对应于真实的"价值。

注意:

  1. My data asNaN
  2. 我认为这个错误可能与na.rm错误而不是mae函数有关,但我无法解决它。

如有任何帮助,我将不胜感激。

Try

mae1 <- function(o,p,m=T) {
mean(abs(o-p),na.rm=m)
}
mae1(df1$AU,df2$UF1)
[1] 0.03733099

注意:你的功能应该工作良好,我不知道为什么你得到这些错误。

Base R解:

# Extract the required vector names in a list:
# req_vec_names => list of vector names
req_vec_names <- list(
act_vec_name = "AU",
pred_vec_names = grep(
"UF\d+",
c(
colnames(df1), 
colnames(df2)
),
value = TRUE
)
)
# Function to create the mean absolute error:
# mae => function()
mae <- function(actual_vec, pred_vec){
return(
mean(
abs(actual_vec - pred_vec),
na.rm = TRUE
)
)
}
# Calculate the mean absolute error: 
# maes => named double vector
maes <- vapply(
req_vec_names$pred_vec_names,
function(x){
mae(
as.double(df1[,req_vec_names$act_vec_name]),
as.double(df2[,x])
)
},
double(1)
)
# Print result: named double vector => stdout(console)
maes
#UF1        UF2        UF3 
#0.03733099 0.04024130 0.06848576 

最新更新