Regex仅查找dplyr中的单词及其后的特殊字符/数字/点



希望您满意。我需要在文本

中找到包含术语info的行
  1. 前后没有字符
  2. 后面接点或任何特殊字符
  3. 后面接一个或多个数字

这是一个数据快照,可以帮助

df_new <- data.frame(
text=c('info is given','he is given info. in the class',
'she needs info2','why not having information',
'his info# missing', 'info12 and packages are given',
'parainfo is ready','info. was awarded',
'meeting is with .info'))
> df_new
text
1                  info is given
2 he is given info. in the class
3                she needs info2
4     why not having information
5              his info# missing
6  info12 and packages are given
7              parainfo is ready
8              info. was awarded
9           meeting is with .info

我正在使用这段代码,但它没有捕获所有我需要的:

df_new %>%
mutate(text=tolower(text)) %>%
mutate(string_detected = as.integer(str_detect(text, "(^|\s)info(\s|$)")))

因此,兴趣的结果是:

text             strings_detected
info is given               1
he is given info. in the class               1   
she needs info2               1
why not having information               0
his info# missing               1
info12 and packages are given               1
parainfo is ready               0
info. was awarded               1 
meeting is with .info              0   

非常感谢!

以下正则表达式应该工作:(^| )info([Wd]|$)。注意,W将排除_,所以如果你想接受info_,你应该使用(^| )info([Wd_]|$)代替。

你可以在http://regex101.com

测试你的正则表达式

相关内容

  • 没有找到相关文章

最新更新