特定的函数通过列表在一定条件下(for循环和/或函数)



我有20年测量数据集(14600x6),需要得到$tu$name$trophic的几何平均值。最初,我将我的df分成三个dfs,我这样做:

基于分割df的旧代码!!

trophic_pp<- df_pp %>% select(sites, name, tu_pp)%>%
group_by(name) %>%
mutate(row = row_number()) %>%
pivot_wider(names_from = name, values_from = tu_pp) %>%
replace(is.na(.), 0)%>%
select(-row)
trophic_dc<- ...... same
trophic_pt<- ...... same

然后

trophic_pp<- trophic_pp%>%
mutate(sum_pp = rowSums(across(where(is.numeric))))
trophic_dc<- ...... same
trophic_pt<- ...... same

然后

trophic_pp_sites <- select("trophic_pp", "sites", "sum_pp") %>%
group_by(sites) %>%
summarise(gmean = gmean(sum_pp)) %>%
add_column(trophic = "pp", .before = "gmean")
trophic_dc<- ...... same
trophic_pt<- ...... same

然后合并并简化为最后的plot

all_trophic <- Reduce(function(x, y) merge(x, y, all=TRUE), list(trophic_pp,
trophic_dc,
trophic_pt)) %>%
mutate(type = case_when(
startsWith(sites, "R") ~ "river",
startsWith(sites, "T") ~ "tributary"
))

正如你所看到的,这是一段冗长而重复的代码。

我将我的数据重新排列到只有一个df而不是三个,str现在看起来像这样:

tibble [14,100 x 6] (S3: tbl_df/tbl/data.frame)
$ name             : Factor w/ 6 levels "Al","As","Cu",..: 1 1 1 1 1 1 1 1 1 1 ...
$ cas              : chr [1:14100] "7429-90-5" "7429-90-5" "7429-90-5" "7429-90-5" ...
$ sites            : chr [1:14100] "R1" "R1" "R1" "R5" ...
$ conc             : num [1:14100] 12.12 12.12 12.12 2.06 2.06 ...
$ trophic          : chr [1:14100] "tu_pp" "tu_pc" "tu_sc" "tu_pp" ...
$ tu               : num [1:14100] 12.41 4.83 7.22 2.11 0.82 ...

其中每个$name都有自己的$cas, 9个$sites,每个$tu都是基于$conc和三个不同的$trophics计算的。因此,$tu是唯一在单行中改变的变量。

我正在努力计算几何平均值。我尝试如下:

定义几何平均函数

gmean <- function(x, na.rm=TRUE){
gmean = exp(mean(log(x)))
}

创建一个基于$trophic

的列表
trophic_list <- split(df, df$trophic)

并通过列表

运行lapply函数
for (i in seq_along(trophic_list)) {

trophic_list[[i]] <- within(trophic_list[[i]], {
gmean <- lapply(trophic_list[tu], FUN: gmean

})
}

很抱歉解释这么久,谢谢你的帮助

如果你可以使用整齐的诗句,这是实现你想要的一种方法:

library(tidyverse)
#use cars to play with
cars <- mpg
#function for geometric mean
#from here https://stackoverflow.com/questions/2602583/geometric-mean-is-there-a-built-in
geo_mean = function(x, na.rm=TRUE){
exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}
#calculate geometric mean per manufacture and year
#in your case group by trophic/name
geo_mean_summary <- cars %>%
group_by(manufacturer, year) %>%
summarize(geoMean_City = geo_mean(cty),
geoMean_HWY = geo_mean(hwy))

如果适用于您的情况,请注意帖子中关于如何处理负值,0或缺失的评论。

最新更新