在 R 中用 tibble 中的空格替换单词,而不进行反连接



我有一堆这样的句子:

提比:1,782×1

Chat
<chr>                                                                                                                                                                    
1 Hi i would like to find out more about the trials
2 Hello I had a guest 
3 Hello my friend overseas right now
...

我想做的是删除像"我","你好"这样的停用词。我已经有一个列表,我想用空格替换这些停用词。我尝试使用 mutate 和 gsub,但它只需要一个正则表达式。反加入在这里不起作用,因为我正在尝试做双字母/三元组我没有一个单词列来反加入停用词。

有没有办法替换R中每个句子中的所有这些单词?

我们可以取消嵌套标记,replace在带有空格(" ")的"stop_words"单词"列中找到的"单词",并在按"行"分组后paste"单词">

library(tidytext)
library(tidyverse)
rowid_to_column(df1, 'lines') %>% 
unnest_tokens(word, Chat) %>% 
mutate(word = replace(word, word %in% stop_words$word, " ")) %>% 
group_by(lines) %>% 
summarise(Chat = paste(word, collapse=' ')) %>%
ungroup %>%
select(-lines)

注意:这会替换在"stop_words"数据集中找到的停用词以" "如果我们只需要替换停用词的自定义子集,则创建这些元素的vector并在mutate步骤中进行更改

v1 <- c("I", "hello", "Hi")
rowid_to_column(df1, 'lines') %>%
...
...
mutate(word = replace(word %in% v1, " ")) %>%
...
...

我们可以用 "\b停用词\b" 构造一个模式,然后使用gsub将它们替换为 "。下面是一个示例。请注意,我ignore.case = TRUE设置为同时包含小写和大写,但您可能希望根据需要进行调整。

dat <- read.table(text = "Chat
1 'Hi i would like to find out more about the trials'
2 'Hello I had a guest' 
3 'Hello my friend overseas right now'",
header = TRUE, stringsAsFactors = FALSE)
dat
#                                                Chat
# 1 Hi i would like to find out more about the trials
# 2                               Hello I had a guest
# 3                Hello my friend overseas right now
# A list of stop word
stopword <- c("I", "Hello", "Hi")
# Create the pattern
stopword2 <- paste0("\b", stopword, "\b")
stopword3 <- paste(stopword2, collapse = "|")
# View the pattern
stopword3
# [1] "\bI\b|\bHello\b|\bHi\b"
dat$Chat <- gsub(pattern = stopword3, replacement = " ", x = dat$Chat, ignore.case = TRUE)
dat
#                                               Chat
# 1     would like to find out more about the trials
# 2                                      had a guest
# 3                     my friend overseas right now

最新更新