我有一个数据帧,每个唯一的行id(subjectID(都有多个观测值,看起来像这样(一个简短的例子(:
df <- data.frame(subjectID = rep(c(1:4), each=3),
y = sample(0:2, 4, replace = TRUE),
x = rep(c(0:2), times = 4))
df <- as.tibble(df)
> df
# A tibble: 12 x 3
subjectID y x
<int> <int> <int>
1 1 0 0
2 1 1 1
3 1 1 2
4 2 2 0
5 2 0 1
6 2 1 2
7 3 1 0
8 3 2 1
9 3 0 2
10 4 1 0
11 4 1 1
12 4 2 2
对于每个唯一的subjectID";unique(df$subjectID(";(即上面例子中的4(我想:
#1. regress y on x (3 x- and y-observations for each subjectID ) to obtain coefficients
reg <- lm(y~x)
#2. store intercept
intercept <- reg$coefficients[1]
#3. store slope
slope <- reg$coefficients[2]
#4. calculate Spearman's rho with library(ggpubr)
rho <- cor(x, y, method = c("spearman"))
#5. calculate the Spearman p-value with library(ggpubr)
pvalue <- cor.test(x, y, method=c("spearman"))
我想将结果存储在一个新的数据帧中,如下所示:
# creating an data frame to store regression output for each unique subjectID
output <- tibble(
subjectID = rep(c(1:4), each=1),
intercept = rep(c(0), each=4),
slope = rep(c(0), each=4),
rho = rep(c(0), each=4),
pvalue = rep(c(0), each=4))
> output
# A tibble: 4 x 5
subjectID intercept slope rho pvalue
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 0 0 0 0
对于每个唯一的subjectID,我有21个x和y-obs,在"真实"数据帧中有100多个唯一的subject ID,所以我希望避免手动操作。所以我想知道是否可以创建一个循环函数来实现这一点?
在dplyr或base R中,我尝试使用管道和group_by(subjectID(,但无法设置函数。
这有点冗长,但它符合@GregorThomas所指的:
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
set.seed(41)
df <- data.frame(
subjectID = rep(c(1:4), each = 3),
y = sample(0:2, 4, replace = TRUE),
x = rep(c(0:2), times = 4)
)
df <- as_tibble(df)
df %>%
nest(c(y, x)) %>%
mutate(reg = map(data, ~ lm(y ~ x, data = .x))) %>%
mutate(cor = map(data, ~ with(.x, cor.test(y, x)))) %>%
mutate(tidied_reg = map(reg, tidy), tidied_cor = map(cor, tidy)) %>%
select(subjectID, tidied_reg, tidied_cor) %>%
unnest() %>%
select(subjectID, term, estimate, statistic1, p.value1) %>%
pivot_wider(names_from = term, values_from = estimate) %>%
select(subjectID, intercept = `(Intercept)`, slope = x, rho = statistic1, pvalue = p.value1)
#> # A tibble: 4 x 5
#> subjectID intercept slope rho pvalue
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1.5 -0.5 -0.577 0.667
#> 2 2 1.5 -0.5 -0.577 0.667
#> 3 3 0.833 0.500 1.73 0.333
#> 4 4 0.167 0.5 1.73 0.333