我试图计算包含大于或等于定义序列中的数字的值的行数,并分组在第二个变量上。例如,另一列中按公司A、公司B和公司C分组的值大于或等于300、400、500的行数。在Excel中,我只会使用COUNTIFS函数,但我不想通过Excel来完成这项任务。
示例数据集和期望结果:
library(tidyverse)
df <- tibble::tribble(
~Company, ~Sales,
"B", 902L,
"B", 575L,
"C", 194L,
"C", 215L,
"A", 515L,
"B", 728L,
"A", 910L,
"C", 889L,
"A", 854L,
"B", 230L,
"C", 188L,
"C", 442L,
"A", 174L,
"A", 723L,
"B", 904L,
"A", 761L,
"B", 987L,
"B", 521L,
"B", 694L,
"B", 530L,
"C", 165L,
"A", 507L,
"B", 316L,
"A", 452L,
"A", 342L,
"B", 413L,
"B", 121L,
"A", 650L,
"B", 801L,
"C", 100L
)
result <- tibble::tribble(
~Company, ~Greater.or.equal.to, ~Count,
"A", 300L, 9L,
"A", 400L, 8L,
"A", 500L, 7L,
"A", 600L, 5L,
"A", 700L, 4L,
"A", 800L, 2L,
"A", 900L, 1L,
"A", 1000L, 0L,
"B", 300L, 11L,
"B", 400L, 10L,
"B", 500L, 9L,
"B", 600L, 6L,
"B", 700L, 5L,
"B", 800L, 4L,
"B", 900L, 3L,
"B", 1000L, 0L,
"C", 300L, 2L,
"C", 400L, 2L,
"C", 500L, 1L,
"C", 600L, 1L,
"C", 700L, 1L,
"C", 800L, 1L,
"C", 900L, 0L,
"C", 1000L, 0L
)
我知道如何使用基数R或dplyr(更熟悉Tidyverse)找到单独的行,但还没能找到一种方法来检查值的序列。我试着创建一个for循环,希望得到正确的答案,但显然是做错了什么。
# These two versions work but is inefficient for when we have a long sequence of variables to check against
length(which(df$Company == "A" & df$Sales >= 300))
df %>%
group_by(Company) %>%
summarise(count = sum(Sales >= 300))
# Attempt at a loop
# Sequence of values to loop over. The number sequence can change as the column we're checking
# against are changing
sequence <- seq(300, 1000, 100)
companies <- c("A", "B", "C")
counting <- function(data, col1, col2, range1, range2){
for (i in range1){
for (j in range2){
length(which(data$col1 == i & data$col2 >= j))
}
}
}
counting(df, Company, Sales, companies, sequence)
任何建议都是非常感谢的!
我们可以对seq
从300到1000进行循环,by
100,filter
使用循环值按'Company'分组后的数据,创建具有行数的summarise
d列(n()
),绑定list
元素,并使用complete
到fill
为'Count'列使用'0'的缺失组合
library(dplyr)
library(purrr)
library(tidyr)
out <- map(seq(300, 1000, by = 100), ~
df %>%
group_by(Company) %>%
filter(Sales >= .x) %>%
summarise(Greater.or.equal.to = .x, Count = n())) %>%
bind_rows %>%
complete(Company, Greater.or.equal.to = seq(300, 1000,
by = 100), fill = list(Count = 0))
与产出
out
# A tibble: 24 × 3
Company Greater.or.equal.to Count
<chr> <dbl> <dbl>
1 A 300 9
2 A 400 8
3 A 500 7
4 A 600 5
5 A 700 4
6 A 800 2
7 A 900 1
8 A 1000 0
9 B 300 11
10 B 400 10
# … with 14 more rows
> all.equal(out, result)
[1] TRUE