根据关键字-R提取每个id的日期

  • 本文关键字:id 日期 提取 关键字 r
  • 更新时间 :
  • 英文 :


我正在尝试提取每个ID的数据,其中包含单词"注册"可能的";以及";确认的";在评论栏中

ID <- c("1","1","1","1","1","2","2","2","2","3","3","4","4")
Comments <-c("employee enrolled"," report generated","employee performed","promotion probable","employee confirmed","employee enrolled"
," writen test completed","employee confirmed ","employee started","employee enrolled "
,"probable employee"," employee enrolled","employee started ")

Date<-c("2020-07-14","2020-07-15","2020-07-15","2020-07-16","2020-07-30","2020-07-01","2020-07-02",
"2020-07-03","2020-07-04","2020-07-30","2020-07-31","2020-07-23","2020-07-23")            

df<- data.frame(ID,Comments,Date)  

我正试图提取每个ID的数据;注册"可能的";以及";确认的";在评论栏中

预期输出:

ID               Comments       Date
1           employee enrolled    2020-07-14
1           promotion probable   2020-07-16
1           employee confirmed   2020-07-30
2           employee enrolled    2020-07-01
2           employee confirmed   2020-07-03
3            employee enrolled   2020-07-30
3           probable employee    2020-07-31
4            employee enrolled   2020-07-23

我们可以使用str_detect

library(dplyr)
library(stringr)
df %>%
filter(str_detect(Comments, 'enrolled|probable|confirmed'))

-输出

# ID            Comments       Date
#1  1   employee enrolled 2020-07-14
#2  1  promotion probable 2020-07-16
#3  1  employee confirmed 2020-07-30
#4  2   employee enrolled 2020-07-01
#5  2 employee confirmed  2020-07-03
#6  3  employee enrolled  2020-07-30
#7  3   probable employee 2020-07-31
#8  4   employee enrolled 2020-07-23

或来自base Rgrepl

subset(df, grepl('enrolled|probable|confirmed', Comments))

最新更新