r——线性回归斜率的总结



此处为新的R用户。

我有一个大约400个站点的数据集,我正在努力获得每个站点的p的标准偏差和回归斜率。

我已经用下面的方法得到了其中的一部分,但我不知道如何处理问题的最后一部分,将线性回归单独拟合到每个站点,得到直线的斜率,并为每个站点创建另一个具有直线斜率的列。

我感谢任何帮助!

# Sample df
df <- data.frame(site.id=c("1", "1", "2", "2", "3", "3"), year=c("2019", "2020", "2019", "2020", "2019", "2020"), p=c(107, 101, 114, 117, 97, 89)
print(df)
# Summarize
df.sum <- df %>%
group_by(site.id) %>%
summarise(p.sd=sd(p))
print(df.sum)

尝试以下任一项:

# 1
df %>%
mutate(year = as.numeric(year)) %>%
group_by(site.id) %>%
summarise(p.sd = sd(p), slope = cov(p, year) / var(year))
# 2
df %>%
mutate(year = as.numeric(year)) %>%
group_by(site.id) %>%
summarise(p.sd = sd(p), slope = coef(lm(p ~ year))[[2]])

如果我们知道每个site.id正好有2行,这是示例数据中的情况,那么这也会起作用:

# 3 - only if every site.id has exactly 2 rows
df %>%
mutate(year = as.numeric(year)) %>%
group_by(site.id) %>%
summarise(p.sd = sd(p), slope = diff(p) / diff(year))

如果我们知道每个site.id恰好有2行和连续年份,那么diff(year(等于1,这是样本数据中的情况,那么它可以简化为:

# 4 - only if every site.id has exactly 2 rows & consecutive years
df %>%
group_by(site.id) %>%
summarise(p.sd = sd(p), slope = diff(p))

备注

我们使用了来自以下问题的输入:

df <- data.frame(site.id=c("1", "1", "2", "2", "3", "3"), 
year = c("2019", "2020", "2019", "2020", "2019", "2020"),
p = c(107, 101, 114, 117, 97, 89))

# Sample df
df <- data.frame(site.id = c(1, 1, 2, 2, 3, 3), 
year = c(2019, 2020, 2019, 2020, 2019, 2020), 
p = c(107, 101, 114, 117, 97, 89))
# split by site.id, fit lm and extract slope coefficient
regression_slopes_list <- split(df, df$site.id) |> 
lapply(function(x) { 
lm(p ~ year, data = x)$coefficients[ 2 ] |> 
as.numeric() 
})
# transform list to data.frame
slopes_df <- data.frame(slope = unlist(regression_slopes_list), 
site.id = names(regression_slopes_list))
# get sd by site.id
sd_df <- tapply(df$p, df$site.id, sd) |> 
as.data.frame() |> 
`colnames<-`('sd')
sd_df$site.id <- rownames(sd_df)
# merge data.frame with slope data with sample df
df <- merge(df, slopes_df, by = 'site.id') |> 
merge(sd_df, by = 'site.id')
print(df)

最新更新