r-按列中的每个唯一id运行回归，每个唯一id有多个观测值

我有一个数据帧，每个唯一的行id(subjectID(都有多个观测值，看起来像这样(一个简短的例子(：

df <- data.frame(subjectID = rep(c(1:4), each=3),
y = sample(0:2, 4, replace = TRUE),
x = rep(c(0:2), times = 4))
df <- as.tibble(df)
> df
# A tibble: 12 x 3
subjectID     y     x
<int> <int> <int>
1         1     0     0
2         1     1     1
3         1     1     2
4         2     2     0
5         2     0     1
6         2     1     2
7         3     1     0
8         3     2     1
9         3     0     2
10         4     1     0
11         4     1     1
12         4     2     2

对于每个唯一的subjectID"；unique(df$subjectID("；(即上面例子中的4(我想：

#1. regress y on x (3 x- and y-observations for each subjectID ) to obtain coefficients
reg <- lm(y~x)
#2. store intercept 
intercept <- reg$coefficients[1]
#3. store slope
slope <- reg$coefficients[2]
#4. calculate Spearman's rho with library(ggpubr)
rho <- cor(x, y, method = c("spearman"))
#5. calculate the Spearman p-value with library(ggpubr)
pvalue <- cor.test(x, y, method=c("spearman"))

我想将结果存储在一个新的数据帧中，如下所示：

# creating an data frame to store regression output for each unique subjectID
output <- tibble(
subjectID = rep(c(1:4), each=1),
intercept = rep(c(0), each=4),
slope = rep(c(0), each=4),
rho = rep(c(0), each=4),
pvalue = rep(c(0), each=4))

> output
# A tibble: 4 x 5
subjectID intercept slope   rho   pvalue
<int>     <dbl> <dbl> <dbl>    <dbl>
1         1         0     0     0        0
2         2         0     0     0        0
3         3         0     0     0        0
4         4         0     0     0        0

对于每个唯一的subjectID，我有21个x和y-obs，在"真实"数据帧中有100多个唯一的subject ID，所以我希望避免手动操作。所以我想知道是否可以创建一个循环函数来实现这一点？

在dplyr或base R中，我尝试使用管道和group_by(subjectID(，但无法设置函数。

这有点冗长，但它符合@GregorThomas所指的：

library(dplyr)
library(tidyr)
library(purrr)
library(broom)
set.seed(41)
df <- data.frame(
subjectID = rep(c(1:4), each = 3),
y = sample(0:2, 4, replace = TRUE),
x = rep(c(0:2), times = 4)
)
df <- as_tibble(df)
df %>%
nest(c(y, x)) %>%
mutate(reg = map(data, ~ lm(y ~ x, data = .x))) %>%
mutate(cor = map(data, ~ with(.x, cor.test(y, x)))) %>%
mutate(tidied_reg = map(reg, tidy), tidied_cor = map(cor, tidy)) %>%
select(subjectID, tidied_reg, tidied_cor) %>%
unnest() %>%
select(subjectID, term, estimate, statistic1, p.value1) %>%
pivot_wider(names_from = term, values_from = estimate) %>%
select(subjectID, intercept = `(Intercept)`, slope = x, rho = statistic1, pvalue = p.value1)
#> # A tibble: 4 x 5
#>   subjectID intercept  slope    rho pvalue
#>       <int>     <dbl>  <dbl>  <dbl>  <dbl>
#> 1         1     1.5   -0.5   -0.577  0.667
#> 2         2     1.5   -0.5   -0.577  0.667
#> 3         3     0.833  0.500  1.73   0.333
#> 4         4     0.167  0.5    1.73   0.333

相关内容

最新更新

热门标签：