希望您满意。我需要在文本
中找到包含术语info
的行- 前后没有字符
- 后面接点或任何特殊字符
- 后面接一个或多个数字
这是一个数据快照,可以帮助
df_new <- data.frame(
text=c('info is given','he is given info. in the class',
'she needs info2','why not having information',
'his info# missing', 'info12 and packages are given',
'parainfo is ready','info. was awarded',
'meeting is with .info'))
> df_new
text
1 info is given
2 he is given info. in the class
3 she needs info2
4 why not having information
5 his info# missing
6 info12 and packages are given
7 parainfo is ready
8 info. was awarded
9 meeting is with .info
我正在使用这段代码,但它没有捕获所有我需要的:
df_new %>%
mutate(text=tolower(text)) %>%
mutate(string_detected = as.integer(str_detect(text, "(^|\s)info(\s|$)")))
因此,兴趣的结果是:
text strings_detected
info is given 1
he is given info. in the class 1
she needs info2 1
why not having information 0
his info# missing 1
info12 and packages are given 1
parainfo is ready 0
info. was awarded 1
meeting is with .info 0
非常感谢!
以下正则表达式应该工作:(^| )info([Wd]|$)
。注意,W
将排除_
,所以如果你想接受info_
,你应该使用(^| )info([Wd_]|$)
代替。
你可以在http://regex101.com
测试你的正则表达式