r语言 - Group by和条件过滤器



我有一个类似于以下的tibble:

data<-tibble(ref=c("ABC", "ABC", "XYZ", "XYZ", "FGH", "FGH", "FGH"), 
type=c("A", "B", "A", "A", "A", "A", "B"))
ref   type 
1 ABC   A    
2 ABC   B    
3 XYZ   A
4 XYZ   A    
5 FGH   A    
6 FGH   A    
7 FGH   B   

我需要按ref分组,如果-在一个组中-typeB存在,返回该行,否则默认返回typea的任何行(但只有1行)

预期输出:

ref   type 
1 ABC   B      
2 XYZ   A    
3 FGH   B     

对于大量的数据,最好在分组前进行排序

tidyverse

library(tidyverse)
df<-tibble(ref=c("ABC", "ABC", "XYZ", "XYZ", "FGH", "FGH", "FGH"), 
type=c("A", "B", "A", "A", "A", "A", "B"))
distinct(df) %>% 
arrange(ref, desc(type)) %>% 
group_by(ref) %>% 
slice_head(n = 1) %>% 
ungroup()
#> # A tibble: 3 × 2
#>   ref   type 
#>   <chr> <chr>
#> 1 ABC   B    
#> 2 FGH   B    
#> 3 XYZ   A

data.table

由reprex包(v2.0.1)于2022-04-27创建

df<-data.frame(ref=c("ABC", "ABC", "XYZ", "XYZ", "FGH", "FGH", "FGH"), 
type=c("A", "B", "A", "A", "A", "A", "B"))
library(data.table)
setDT(df)[order(ref, -type), .SD[1], by = ref]
#>    ref type
#> 1: ABC    B
#> 2: FGH    B
#> 3: XYZ    A

由reprex包(v2.0.1)于2022-04-27创建

如果您只有AB,那么您可以安排并简单地获得第一行,即

library(dplyr)
data %>% 
group_by(ref) %>% 
filter(type %in% c('A', 'B')) %>% #If other types exist
arrange(desc(type)) %>% 
slice(1L)
# A tibble: 3 x 2
# Groups:   ref [3]
ref   type 
<chr> <chr>
1 ABC   B    
2 FGH   B    
3 XYZ   A

我们可以使用which.maxover boolean来提取所需的行

data %>%
group_by(ref) %>%
slice(which.max(type == "B")) %>%
ungroup()

,

# A tibble: 3 x 2
ref   type 
<chr> <chr>
1 ABC   B
2 FGH   B
3 XYZ   A

最新更新