使用R中的nls函数对分组数据进行循环



我有一个分组的数据集。我的数据按GaugeID分组。我有一个nls函数,我想在每个组上循环,并提供一个输出值。

library(tidyverse)
library(stats)
# sample of data (yearly), first column is gauge (grouping variable), year, then two formula inputs PETvP and ETvP 
# A tibble: 10 x 4
GaugeID  WATERYR  PETvP  ETvP 
<chr>      <dbl>  <dbl> <dbl>  
1 06892000    1981  0.854 0.754 
2 06892000    1982  0.798 0.708 
3 06892000    1983  1.12  0.856 
4 06892000    1984  0.905 0.720  
5 06892000    1985  0.721 0.618 
6 06892000    1986  0.717 0.625 
7 06892000    1987  0.930 0.783 
8 06892000    1988  1.57  0.945 
9 06892000    1989  1.15  0.739 
10 06892000    1990  0.933 0.805 
11 08171300    1981  0.854 0.754 
12 08171300    1982  0.798 0.708 
13 08171300    1983  1.12  0.856 
14 08171300    1984  0.905 0.720  
15 08171300    1985  0.721 0.618 
16 08171300    1986  0.717 0.625 
17 08171300    1987  0.930 0.783 
18 08171300    1988  1.57  0.945 
19 08171300    1989  1.15  0.739 
20 08171300    1990  0.933 0.805 
# attempted for loop
for (i in unique(yearly$GaugeID)) {
myValue = nls(ETvP[i] ~ I(1 + PETvP[i] - (1 + PETvP[i]^(w))^(1/w)), data = yearly,
start =  list(w = 2), trace = TRUE)
}

我得到以下错误

Error in model.frame.default(formula = ~ETvP + i + PETvP, data = yearly) : 
variable lengths differ (found for 'i')

我还没有发现太多关于使用nls函数进行循环的信息。本质上,我正在生成曲线,需要为每个仪表输出曲线(w(的值。如果我只将公式分配给一个量表(如果我对数据进行子集设置,即第一个量表(,它会起作用,但当我试图在具有分组数据的整个数据帧上使用它时,它不会起作用。例如,这适用于

# gaugeA 
# A tibble: 10 x 4
GaugeID  WATERYR  PETvP  ETvP 
<chr>      <dbl>  <dbl> <dbl>  
1 06892000    1981  0.854 0.754 
2 06892000    1982  0.798 0.708 
3 06892000    1983  1.12  0.856 
4 06892000    1984  0.905 0.720  
5 06892000    1985  0.721 0.618 
6 06892000    1986  0.717 0.625 
7 06892000    1987  0.930 0.783 
8 06892000    1988  1.57  0.945 
9 06892000    1989  1.15  0.739 
10 06892000    1990  0.933 0.805 
test = nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)), data = gaugeA, 
start =  list(w = 2), trace = TRUE)
1.574756    (4.26e+00): par = (2)
0.2649549   (1.46e+00): par = (2.875457)
0.09466832  (3.32e-01): par = (3.59986)
0.08543699  (2.53e-02): par = (3.881397)
0.08538308  (9.49e-05): par = (3.907099)
0.08538308  (1.13e-06): par = (3.907001)
> test
Nonlinear regression model
model: ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w))
data: gaugeA
w 
3.907 
residual sum-of-squares: 0.08538
Number of iterations to convergence: 5 
Achieved convergence tolerance: 1.128e-06

关于如何获得整个分组数据帧的子集结果,有什么想法吗?它有600多种不同的仪表。提前谢谢。

以下任何一项都有效:

使用summarise:

df %>%
group_by(GaugeID) %>%
summarise(result = list(nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)), 
data = cur_data(), 
start =  list(w = 2)))) %>%
pull(result)
[[1]]
Nonlinear regression model
model: ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w))
data: cur_data()
w 
3.607 
residual sum-of-squares: 0.01694
Number of iterations to convergence: 5 
Achieved convergence tolerance: 7.11e-08
[[2]]
Nonlinear regression model
model: ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w))
data: cur_data()
w 
1.086 
residual sum-of-squares: 0.1532
Number of iterations to convergence: 5 
Achieved convergence tolerance: 2.685e-07


使用map:

df %>%
group_split(GaugeID) %>%
map(~nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)), 
data = .x, 
start =  list(w = 2)))

对于分组数据上的循环函数,我通常更喜欢purrrdplyr。我不能编辑数据,但也许这是有效的:

library(dplyr)
library(purrr)
yearly %>% group_by(GaugeID) %>% summarise(test = nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)), data = gaugeA, start =  list(w = 2), trace = TRUE)

单个模型可以通过公式化消除循环。确保GaugeID是一个因子,在公式中用GaugeID下标w,并提供一个起始值列表,其w分量是一个向量,每个级别的GaugeID都有起始值。

df$GaugeID <- factor(df$GaugeID)
fo <- ETvP ~ 1 + PETvP - (1 + PETvP^(w[GaugeID]))^(1/w[GaugeID])
st <- list(w = rep(2, nlevels(df$GaugeID)))
fm <- nls(fo, df, start = st)
fm
summary(fm)
data.frame(GaugeID = levels(df$GaugeID), coef(summary(fm)), check.names = FALSE)

最新更新