我有一个数据框架
ID_Student Text_Message
1 John Doe Hell like I want to fxxk around
2 Peter Gynn You such an ass
3 Jolie Hope Go to hell
我有向量
> Ban_words
[1] "fxxk" "ass" "hell"
如何返回使用违禁词汇的学生的ID和他们使用的词汇?什么好主意吗?
目前为止我的解决方案。
数据ID_Student <- c("John Doe", "Peter Gynn", "Jolie Hope", "Mike Tyson")
Text_Message <- c("hell I want to fxxk around", "You such an ass", "Go to hell", "I love you")
Ban_words <- c("fxxk", "ass", "hell")
Student_Message <-data.frame(ID_Student,Text_Message)
数据框应该像这样
ID_Student Text_Message
1 John Doe hell I want to fxxk around
2 Peter Gynn You such an ass
3 Jolie Hope Go to hell
4 Mike Tyson I love you
代码for (i in Ban_words){
Detention_List<-Student_Message %>% filter (grepl(i, Text_Message))%>%
pull(ID_Student)
print(Detention_List)
}
返回
[1] "John Doe"
[1] "Peter Gynn"
[1] "John Doe" "Jolie Hope"
所以,对于乐队单词'fxxk',只有John使用了它。但是对于"hell"这个词,约翰和朱莉都用了。
我们可以把所有的"将paste(collapse = "|")
放入一个正则表达式中,然后使用grepl对这个正则表达式进行过滤。然后是pull
,有名字的向量。正如你所看到的,这返回了所有学生的名字,除了"迈克",因为他没有使用禁止词(见我编辑的数据)。
library(dplyr)
df %>% filter (grepl(paste(Ban_words, collapse = '|'), Text_Message)) %>%
pull(student)
[1] "John Doe" "Peter Gyn" "Jolie Hope"
df<-data.frame(student=c('John Doe', 'Peter Gyn', 'Jolie Hope', 'Mike'), Text_Message=c('I want to fxxk around', 'You such an ass', 'Go to hell', "I love you"))
> df
student Text_Message
1 John Doe I want to fxxk around
2 Peter Gyn You such an ass
3 Jolie Hope Go to hell
4 Mike I love you
Ban_words<-c("fxxk", "ass", "hell")