我有一个包含3个不同变量的数据集:
id gender phase
a1 m 1
a1 m 2
a1 m 3
b2 m 1
b2 f 2
b2 m 3
c3 f 1
c3 f 2
c3 f 3
...
请注意,对于id==b2, phase==2,性别被意外地标记为"f",它应该与其他阶段一致,性别=="m"因为在研究阶段性别是不能改变的。因此,如果我想运行一个R代码来检测哪些id有这样的问题,我应该如何实现这个目标?非常感谢~~
对于dplyr
,您可以使用n_distinct()
检测哪些id具有一个以上的性别。
library(dplyr)
df %>%
group_by(id) %>%
filter(n_distinct(gender) > 1) %>%
ungroup()
# # A tibble: 3 × 3
# id gender phase
# <chr> <chr> <int>
# 1 b2 m 1
# 2 b2 f 2
# 3 b2 m 3
您可以使用lag
检查列中的值是否发生了变化,filter
检查id是否有这样的变化:
df <- read.table(text="id gender phase
a1 m 1
a1 m 2
a1 m 3
b2 m 1
b2 f 2
b2 m 3
c3 f 1
c3 f 2
c3 f 3", header = TRUE)
library(dplyr)
df %>%
group_by(id) %>%
filter(any(gender != lag(gender)))
#> # A tibble: 3 × 3
#> # Groups: id [1]
#> id gender phase
#> <chr> <chr> <int>
#> 1 b2 m 1
#> 2 b2 f 2
#> 3 b2 m 3
由reprex包(v2.0.1)创建于2022-07-13
id<-c("a1","a1","a1","b2","b2","b2","c3","c3","c3")
gender<-c("m","m","m","m","f","m","f","f","f")
phase<-c(1,2,3,1,2,3,1,2,3)
mydata<-data.frame(id,gender,phase)
mydata[mydata$id%in%c("a1","b2"),"gender"]<-"m"
mydata[mydata$id%in%c("c3"),"gender"]<-"f"
mydata