r语言 - 计算值大于或等于序列中每个值的行数



我试图计算包含大于或等于定义序列中的数字的值的行数,并分组在第二个变量上。例如,另一列中按公司A、公司B和公司C分组的值大于或等于300、400、500的行数。在Excel中,我只会使用COUNTIFS函数,但我不想通过Excel来完成这项任务。

示例数据集和期望结果:

library(tidyverse)
df <- tibble::tribble(
~Company, ~Sales,
"B",   902L,
"B",   575L,
"C",   194L,
"C",   215L,
"A",   515L,
"B",   728L,
"A",   910L,
"C",   889L,
"A",   854L,
"B",   230L,
"C",   188L,
"C",   442L,
"A",   174L,
"A",   723L,
"B",   904L,
"A",   761L,
"B",   987L,
"B",   521L,
"B",   694L,
"B",   530L,
"C",   165L,
"A",   507L,
"B",   316L,
"A",   452L,
"A",   342L,
"B",   413L,
"B",   121L,
"A",   650L,
"B",   801L,
"C",   100L
)
result <- tibble::tribble(
~Company, ~Greater.or.equal.to, ~Count,
"A",                 300L,     9L,
"A",                 400L,     8L,
"A",                 500L,     7L,
"A",                 600L,     5L,
"A",                 700L,     4L,
"A",                 800L,     2L,
"A",                 900L,     1L,
"A",                1000L,     0L,
"B",                 300L,    11L,
"B",                 400L,    10L,
"B",                 500L,     9L,
"B",                 600L,     6L,
"B",                 700L,     5L,
"B",                 800L,     4L,
"B",                 900L,     3L,
"B",                1000L,     0L,
"C",                 300L,     2L,
"C",                 400L,     2L,
"C",                 500L,     1L,
"C",                 600L,     1L,
"C",                 700L,     1L,
"C",                 800L,     1L,
"C",                 900L,     0L,
"C",                1000L,     0L
)

我知道如何使用基数R或dplyr(更熟悉Tidyverse)找到单独的行,但还没能找到一种方法来检查值的序列。我试着创建一个for循环,希望得到正确的答案,但显然是做错了什么。

# These two versions work but is inefficient for when we have a long sequence of variables to check against
length(which(df$Company == "A" & df$Sales >= 300))
df %>% 
group_by(Company) %>% 
summarise(count = sum(Sales >= 300))
# Attempt at a loop
# Sequence of values to loop over. The number sequence can change as the column we're checking
# against are changing
sequence <- seq(300, 1000, 100)
companies <- c("A", "B", "C")
counting <- function(data, col1, col2, range1, range2){
for (i in range1){
for (j in range2){
length(which(data$col1 == i & data$col2 >= j))
}
}
}
counting(df, Company, Sales, companies, sequence)

任何建议都是非常感谢的!

我们可以对seq从300到1000进行循环,by100,filter使用循环值按'Company'分组后的数据,创建具有行数的summarised列(n()),绑定list元素,并使用completefill为'Count'列使用'0'的缺失组合

library(dplyr)
library(purrr)
library(tidyr)
out <- map(seq(300, 1000, by = 100), ~ 
df %>%
group_by(Company) %>% 
filter(Sales >= .x) %>% 
summarise(Greater.or.equal.to = .x, Count = n())) %>%
bind_rows %>%
complete(Company, Greater.or.equal.to = seq(300, 1000,
by = 100), fill = list(Count = 0))

与产出

out
# A tibble: 24 × 3
Company Greater.or.equal.to Count
<chr>                 <dbl> <dbl>
1 A                       300     9
2 A                       400     8
3 A                       500     7
4 A                       600     5
5 A                       700     4
6 A                       800     2
7 A                       900     1
8 A                      1000     0
9 B                       300    11
10 B                       400    10
# … with 14 more rows
> all.equal(out, result)
[1] TRUE

最新更新