r-查找字符串并根据其他字符串排除

我找不到答案如何计算数据框中的单词并排除是否找到其他单词。我的DF低于：

words <- c("INSTANCE find", "LA LA LA", "instance during",
           "instance", "instance", "instance", "find instance")
df <- data.frame(words)
df$words_count <- grepl("instance", df$words, ignore.case = T)

它计算的所有实例" instance" 我一直在尝试在Word find 的情况下排除任何行。

我可以添加另一个grepl以查找" find" 并基于该排除，但我尝试限制代码的行数。

我敢肯定有一个使用单个正则表达式的解决方案，但是您可以做

df$words_count <- Reduce(`-`, lapply(c('instance', 'find'), grepl, df$words)) > 0

或

df$words_count <- Reduce(`&`, lapply(c('instance', '^((?!find).)*$'), grepl, df$words, perl = T, ignore.case = T))

这可能更容易阅读

library(tidyverse)
df$words_count <- c('instance', '^((?!find).)*$') %>% 
                    lapply(grepl, df$words, perl = T, ignore.case = T) %>%
                    reduce(`&`)

如果您需要的只是次数"实例"出现在字符串中，则在任何地方找到"查找"(find thind(中的所有字符串：

df$counts <- sapply(gregexpr("\binstance\b", words, ignore.case=TRUE), function(a) length(a[a>0])) *
  !grepl("\bfind\b", words, ignore.case=TRUE)
df
#             words counts
# 1   INSTANCE find      0
# 2        LA LA LA      0
# 3 instance during      1
# 4        instance      1
# 5        instance      1
# 6        instance      1
# 7   find instance      0

相关内容

最新更新

热门标签：