使用r在R中的clesgate与用户定义的功能,该功能取决于两列



我有一个数据框架data_sulfat:

year Concentration Precipitaion
2000 19.01 23.7251908396947
2000 3.38 25.8842975206612
2000 7.78 12.9504950495049
2000 50.33 22.6153846153846
2000 0.04 19.063829787234
2000 1.4 67.8202247191011
2000 5.11 88.4383561643836
2000 4.53 42.7234042553192
2001 9.57 33
2001 5.25 21.3023255813953
2001 2.33 28.9491525423729
2001 9.29 42.7428571428571
2001 4.01 16.8813559322034
2001 0.39 125.093525179856
2001 1.14 50
2001 6.1 51.0909090909091
2001 1.19 25.0833333333333
2001 4.09 35.921568627451
2001 1.89 127.396226415094
2002 1.28 100.266666666667
2002 5.96 29.5922330097087
2002 2.36 49.0526315789474
2002 5.47 121.756097560976
2002 13.03 53.6978417266187
2002 6.57 23.7575757575758
2002 5.11 74.4375
2002 0.65 29.3592233009709
2002 0.39 180.512195121951
2002 3.35 20.5423728813559
2002 12.92 53.5789473684211
2002 10.01 24.5274725274725
2002 4.66 39.6363636363636
2002 2.25 13.6901408450704
2002 1.31 96.24
2002 1.13 13.1428571428571
2002 5.45 19.8347107438017
2002 6.4 57.375
2002 1.06 186
2002 3.09 59.2142857142857

我有自己的功能,取决于两列 - 浓度和悬空

user_function <- function(Concentration, Precipitaion){
  return(mean(rnorm(10000, 
                    mean = mean(Concentration),
                    sd = sd(Precipitaion))))}

我尝试使用此功能

aggregate(data_sulfat, by=list(data_sulfat$year), 
          FUN = user_function(data_sulfat$Concentration,
                              data_sulfat$Precipitaion))

我有一个错误。请告诉我,我如何正确地使用函数,该功能取决于两个或多列

您有几个选项

d <- structure(list(
    year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 
        2000L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 
        2001L, 2001L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 
        2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 
        2002L, 2002L), 
    Concentration = c(19.01, 3.38, 7.78, 50.33, 0.04, 1.4, 5.11, 4.53, 9.57, 
        5.25, 2.33, 9.29, 4.01, 0.39, 1.14, 6.1, 1.19, 4.09, 1.89, 1.28, 
        5.96, 2.36, 5.47, 13.03, 6.57, 5.11, 0.65, 0.39, 3.35, 12.92, 
        10.01, 4.66, 2.25, 1.31, 1.13, 5.45, 6.4, 1.06, 3.09), 
    Precipitaion = c(23.7251908396947, 25.8842975206612, 12.9504950495049, 
        22.6153846153846, 19.063829787234, 67.8202247191011, 88.4383561643836, 
        42.7234042553192, 33, 21.3023255813953, 28.9491525423729, 
        42.7428571428571, 16.8813559322034, 125.093525179856, 50, 
        51.0909090909091, 25.0833333333333, 35.921568627451, 127.396226415094, 
        100.266666666667, 29.5922330097087, 49.0526315789474, 121.756097560976, 
        53.6978417266187, 23.7575757575758, 74.4375, 29.3592233009709, 
        180.512195121951, 20.5423728813559, 53.5789473684211, 24.5274725274725, 
        39.6363636363636, 13.6901408450704, 96.24, 13.1428571428571, 
        19.8347107438017, 57.375, 186, 59.2142857142857)), 
    class = "data.frame", row.names = c(NA, -39L))

一个简单的解决方案是使用拆分和lapply

user_function <- function(d) {
    mean(rnorm(10000, mean = mean(d$Concentration), sd = sd(d$Precipitaion)))
}
lapply(split(d, d$year), user_function)

最新更新