r-使用lapply或purrr::map，其中第一个列表元素的结果用于后面的列表元素

假设以下列表：

df1 <- data.frame(id = 1:3,
x = 1:3,
y = 4:6,
level = rep("L1", 3))
df2 <- data.frame(id = 4:6,
x = 2:4,
y = 5:7,
level = rep("L1", 3))
df3 <- data.frame(id = 1:6,
x = 1:6,
y = 7:13,
level = rep("L2", 6))
my_list <- list(df1, df2, df3)

我现在想首先"；循环；通过具有级别CCD_ 1的数据帧，并且在此示例中简单地取x+y的和(按行(。

然而，当我到达df3(其具有level=L2(时，我希望首先行绑定来自df1和df2的各个求和结果，将该结果与df3匹配(基于id列(，然后对于df3/level=L2，取x+y+来自L1级的结果的和。

如果只有两个级别，我可以硬编码，但是，我可能有很多级别。

所以我的问题是：我该如何编写函数或获取lapply/purrr:：映射到代码的某个块，以便它为L1做一些事情，然后将这些结果输入到L2的下一个块中，一旦我们通过L2，我们就会进入L3，依此类推

运行类似lapply(my_list, custom_function)的东西后的预期结果将是

result1 <- data.frame(id = 1:3,
x = 1:3,
y = 4:6,
level = rep("L1", 3),
result = c(5, 7, 9))
result2 <- data.frame(id = 4:6,
x = 2:4,
y = 5:7,
level = rep("L1", 3),
result = c(7, 9, 11))
result3 <- data.frame(id = 1:6,
x = 1:6,
y = 7:12,
level = rep("L2", 6),
result_L1 = c(5, 7, 9, 7, 9, 11),
result = c(13, 17, 21, 21, 25, 29))
final_result <- list(result1, result2, result3)

(稍微解释一下计算：

df1，第一行：x+y是1+4，所以结果=5
当我们得到具有L2级的df3时，我们首先将这个结果(5(与df3匹配，然后取x+y+result_L1=1+7+5=13的和

复杂的操作发生在级别之间，所以我首先要重新组织一种结构，其中每个级别有一个数据帧的列表。自x + y需要始终进行计算，我们也可以这样做：

library(tidyverse)
lvls <- bind_rows(my_list) %>%
mutate(result = x + y) %>%
split(.$level)
lvls
#> $L1
#>   id x y level result
#> 1  1 1 4    L1      5
#> 2  2 2 5    L1      7
#> 3  3 3 6    L1      9
#> 4  4 2 5    L1      7
#> 5  5 3 6    L1      9
#> 6  6 4 7    L1     11
#> 
#> $L2
#>    id x  y level result
#> 7   1 1  7    L2      8
#> 8   2 2  8    L2     10
#> 9   3 3  9    L2     12
#> 10  4 4 10    L2     14
#> 11  5 5 11    L2     16
#> 12  6 6 12    L2     18

要添加上一级别的结果，我们可以使用accumulate():

lvls %>%
accumulate(function(l1, l2) {
l1 %>%
select(id, result0 = result) %>%
right_join(l2, by = "id") %>%
mutate(result = result + result0)
})
#> $L1
#>   id x y level result
#> 1  1 1 4    L1      5
#> 2  2 2 5    L1      7
#> 3  3 3 6    L1      9
#> 4  4 2 5    L1      7
#> 5  5 3 6    L1      9
#> 6  6 4 7    L1     11
#> 
#> $L2
#>   id result0 x  y level result
#> 1  1       5 1  7    L2     13
#> 2  2       7 2  8    L2     17
#> 3  3       9 3  9    L2     21
#> 4  4       7 4 10    L2     21
#> 5  5       9 5 11    L2     25
#> 6  6      11 6 12    L2     29

根据您评论中的额外上下文，我想不同的方法。这个想法仍然是跨级别累积权重，但我会将另一个逻辑推入一个助手函数来处理每个级别：

find_weights <- function(data, balance, stratify = character(), weights = "wt") {
if (length(stratify) > 0) {
# Apply separately to strata
strata <- data[stratify]

data <- split(data, strata) |>
lapply(find_weights, balance, weights = weights) |>
unsplit(strata)

return(data)
} 

# Get initial weights
w0 <- data[[weights]]
if (is.null(w0)) {
w0 <- rep_len(1, nrow(data))
}

# Find balancing weights (use your function)
f <- factor(data[[balance]])
n <- as.numeric(tapply(w0, f, sum))
w <- prop.table(1 / n[f]) * sum(w0)

# Update weights
data[[weights]] <- w0 * w

data
}

一些示例数据：

data <- data.frame(country = rep(c("A", "B"), c(3, 5)), sex = c("m", "f"))
data
#>   country sex
#> 1       A   m
#> 2       A   f
#> 3       A   m
#> 4       B   f
#> 5       B   m
#> 6       B   f
#> 7       B   m
#> 8       B   f

和级别描述：

opts <- list(
l1 = list(balance = "sex", stratify = "country"),
l2 = list(balance = "country")
)

然后应用，累积权重：

Reduce(function(data, opts) {
find_weights(data, opts$balance, opts$stratify)
}, opts, data, accumulate = TRUE)
#> [[1]]
#>   country sex
#> 1       A   m
#> 2       A   f
#> 3       A   m
#> 4       B   f
#> 5       B   m
#> 6       B   f
#> 7       B   m
#> 8       B   f
#> 
#> [[2]]
#>   country sex        wt
#> 1       A   m 0.7500000
#> 2       A   f 1.5000000
#> 3       A   m 0.7500000
#> 4       B   f 0.8333333
#> 5       B   m 1.2500000
#> 6       B   f 0.8333333
#> 7       B   m 1.2500000
#> 8       B   f 0.8333333
#> 
#> [[3]]
#>   country sex        wt
#> 1       A   m 1.0000000
#> 2       A   f 2.0000000
#> 3       A   m 1.0000000
#> 4       B   f 0.6666667
#> 5       B   m 1.0000000
#> 6       B   f 0.6666667
#> 7       B   m 1.0000000
#> 8       B   f 0.6666667

相关内容

最新更新

热门标签：