r语言 - 如何返回短信中使用禁止词的学生ID(更新)



我有一个数据框架

ID_Student                    Text_Message
1   John Doe Hell like I want to fxxk around
2 Peter Gynn                 You such an ass
3 Jolie Hope                      Go to hell
我有向量
> Ban_words
[1] "fxxk" "ass"  "hell"

如何返回使用违禁词汇的学生的ID和他们使用的词汇?什么好主意吗?

目前为止我的解决方案。

数据
ID_Student <- c("John Doe", "Peter Gynn", "Jolie Hope", "Mike Tyson")
Text_Message <- c("hell I want to fxxk around", "You such an ass", "Go to hell", "I love you")
Ban_words <- c("fxxk", "ass", "hell")
Student_Message <-data.frame(ID_Student,Text_Message)

数据框应该像这样

ID_Student               Text_Message
1   John Doe hell I want to fxxk around
2 Peter Gynn            You such an ass
3 Jolie Hope                 Go to hell
4 Mike Tyson                 I love you

代码
for (i in Ban_words){
Detention_List<-Student_Message %>% filter (grepl(i, Text_Message))%>%
pull(ID_Student)
print(Detention_List)
}

返回

[1] "John Doe"
[1] "Peter Gynn"
[1] "John Doe"   "Jolie Hope"

所以,对于乐队单词'fxxk',只有John使用了它。但是对于"hell"这个词,约翰和朱莉都用了。

我们可以把所有的"将paste(collapse = "|")放入一个正则表达式中,然后使用grepl对这个正则表达式进行过滤。然后是pull,有名字的向量。正如你所看到的,这返回了所有学生的名字,除了"迈克",因为他没有使用禁止词(见我编辑的数据)。

library(dplyr)
df %>% filter (grepl(paste(Ban_words, collapse = '|'), Text_Message)) %>%
pull(student)
[1] "John Doe"   "Peter Gyn"  "Jolie Hope"

df<-data.frame(student=c('John Doe', 'Peter Gyn', 'Jolie Hope', 'Mike'), Text_Message=c('I want to fxxk around', 'You such an ass', 'Go to hell', "I love you"))
> df
student          Text_Message
1   John Doe I want to fxxk around
2  Peter Gyn       You such an ass
3 Jolie Hope            Go to hell
4       Mike            I love you
Ban_words<-c("fxxk", "ass",  "hell")

最新更新