将一个变量传递到一个筛选器-R dplyr中



这是我拥有的数据集的一个示例。我正在寻找商店数量最多的州。在这种情况下,CA还可以查看有多少ID来自该状态

| ID | | State | | Stores| 
| -- | |------ | | ----- | 
|a11 | | CA    | | 16585 | 
|a12 | | CA    | | 45552 | 
|a13 | | AK    | | 7811  |
|a14 | | MA    | | 4221  |

我有这个代码使用dplyr

max_state <- df  %>%
group_by(State)  %>%
summarise(total_stores = sum(Stores))  %>%
top_n(1)  %>%
select(State)

这给了我";CA";

我可以使用这个变量";max(state(";通过过滤器并使用summary(n(((来计算CA的Id数量?

几种方法:

# this takes your max_state (CA) and brings in the parts of 
# your original table that have the same State
max_state %>% 
left_join(df) %>%
summarize(n = n())
# filter the State in df to match the State in max_state
df %>%
filter(State == max_state$State) %>%
summarize(n = n())

# Add Stores_total for each State, only keep the State rows which
# match that of the max State, and count the # of IDs therein
df %>%
group_by(State) %>%
mutate(Stores_total = sum(Stores)) %>%
filter(Stores_total == max(Stores_total)) %>% 
count(ID)

您可以将多个操作组合到一个summarize调用中,该调用将应用于同一组:

df |>
group_by(State) |>
summarize(gsum = sum(Stores), nids = n()) |>
filter(gsum == max(gsum))
##>+ # A tibble: 1 × 3
##>  State  gsum  nids
##>  <chr> <dbl> <int>
##>1 CA    62137     2

其中数据集df通过以下方式获得:

df <- data.frame(ID = c("a11", "a12","a13", "a14"),
State = c("CA", "CA", "AK", "MA"),
Stores = c(16585, 45552, 7811, 4221))

相关内容

最新更新