编写函数，使用命名列表在R中执行条件汇总

我正在尝试编写一个函数，该函数接受tibble和筛选器规范列表，并根据这些筛选器规范执行条件汇总。

# Sample DF with a column to summarize and 2 ID columns.
df <- tibble(
to_summarize = c(1, 2, 8, 9),
ID1 = c('A', 'A', 'C', 'A'),
ID2 = c('X', 'Y', 'Z', 'X')
)

我们可以有条件地使用两个ID进行汇总(返回10(，或者使用一个ID(返回12(。

df %>%
summarize(
total1 = sum(to_summarize[ID1 == 'A' & ID2 == 'X']),
total2 = sum(to_summarize[ID1 == 'A'])
)

我想在一个功能中允许同样的灵活性。用户应该能够提供一个筛选器列表或一个空列表(其中summary函数将在整个列上执行，而不进行筛选(。

我想最简单的方法是使用命名列表，其中每个名称都是要筛选的列，每个值都是要过滤该列的值

filters <- list(
ID1 = 'A',
ID2 = 'X'
)
# Here is my attempt at a function to implement this:
summarise_and_filter <- function(df, filters) {
df %>%
summarise(
total = sum(to_summarize[names(filters) == unname(unlist(filters))]))
}
# It does not work, it just returns zero
df %>%
summarise_and_filter(
filters = filters
)
# I imagine the function might need to call map in some way, or perhaps imap?
map_summarise_and_filter <- function(df, filters) {
df %>%
summarise(
total = sum(
to_summarize[
imap_lgl(
filters, 
~.y == .x
)]
)
)
}
# But this also returns zero
df %>%
map_summarise_and_filter(
filters = filters
)

有两个操作完成，其中一个可以动态计算

library(dplyr)
df %>%
mutate(total2 = sum(to_summarize[ID1 == filters[['ID1']]])) %>% 
filter(across(starts_with("ID"), ~ . == 
filters[[cur_column()]])) %>%
summarise(total1 = sum(to_summarize),total2 = first(total2))

-输出

# A tibble: 1 x 2
total1 total2
<dbl>  <dbl>
1     10     12

如果我们想在没有filter的情况下执行此操作，则reduce将across输出到单个逻辑vector到subset

library(purrr)
df %>% 
summarise(total1 = sum(to_summarize[across(starts_with('ID'), 
~ . == filters[[cur_column()]]) %>% 
reduce(`&`)]), 
total2 = sum(to_summarize[ID1 == filters[['ID1']]]))

-ouptut

# A tibble: 1 x 2
total1 total2
<dbl>  <dbl>
1     10     12

相关内容

最新更新

热门标签：