r语言 - 如何根据其他变量标记/删除特定重复项



我想知道如何根据列中的特定值删除特定行,但这些删除取决于子组中的其他变量。如果">aja"与"ase">一起分组,我想删除它。如果子组同时具有"ase"或"aja",则脚本应将其保留。我已经指出了脚本应该删除哪些。

id  somedata  subgroup
1  1   "aja"     okay
2  1   "aja"     okay
3  2   "ase"     okay
4  2   "aja"     delete
5  3   "aja"     delete
6  3   "ase"     okay
7  4   "aja"     okay
8  4   "aja"     okay
9  5   "ase"     okay
10 5   "ase"     okay
11 6   "aja"     delete
12 6   "ase"     okay


Code to generate the data
id = c(1,1,2,2,3,3,4,4,5,5,6,6)
somedata = c("aja","aja","ase","aja","aja","ase","aja","aja","ase","ase","aja","ase")
subgroup = c("okay","okay","okay","DELETE","DELETE","okay","okay","okay","okay","okay","DELETE","okay")
proov = data.frame(cbind(id,somedata,subgroup))

您可以进行简单的过滤,即

library(dplyr)
proov %>% 
group_by(id) %>% 
filter(!(n_distinct(somedata) > 1 & somedata == 'aja'))

这给了,

# A tibble: 9 x 3
# Groups:   id [6]
id    somedata subgroup
<fct> <fct>    <fct>   
1 1     aja      okay    
2 1     aja      okay    
3 2     ase      okay    
4 3     ase      okay    
5 4     aja      okay    
6 4     aja      okay    
7 5     ase      okay    
8 5     ase      okay    
9 6     ase      okay    

我们可以按id分组并删除其中'somedata =="aja"并且至少有一个"ase"的行

library(dplyr)
proov %>% group_by(id) %>% filter(!(somedata == "aja" & any(somedata == "ase")))
#  id    somedata subgroup
# <fct> <fct>    <fct>   
#1 1     aja      okay    
#2 1     aja      okay    
#3 2     ase      okay    
#4 3     ase      okay    
#5 4     aja      okay    
#6 4     aja      okay    
#7 5     ase      okay    
#8 5     ase      okay    
#9 6     ase      okay    

在基数 R 中可以写为

subset(proov, !as.logical(ave(as.character(somedata), 
id, FUN = function(x) x == "aja" & any(x == "ase"))))

无需使用任何其他软件包,您可以使用以下命令:

proov = proov[!(proov$id %in% unique(proov[which(proov$somedata == "ase"), "id"]) & proov$somedata == "aja"),]

最新更新