r-在特定单词之前搜索最多3个单词的数值



是否可以通过正则表达式在特定单词(比如years(之前搜索最多3个单词的数值?在下面的例子中,我正在搜索years之前的单词,它可以工作,但如果您查看第三个元素,它会返回more。这里我需要2XX or more years的模式不是固定的,因此我试图在years之前找到最多3个单词的数值

Description <- c("Candidate having bachelor degree. Minimum 5 years in R", "Excellent academic background plus 3 years of experience in Python", "Analytics Professionals having minimum of 2 or more years of experience", "Candidate possessing credit risk experience plus 2+ years in Python", "Candidate possessing credit risk experience plus two or more years in Python")
[1] "Candidate having bachelor degree. Minimum 5 years in R"                      
[2] "Excellent academic background plus 3 years of experience in Python"          
[3] "Analytics Professionals having minimum of 2 or more years of experience"     
[4] "Candidate possessing credit risk experience plus 2+ years in Python"         
[5] "Candidate possessing credit risk experience plus two or more years in Python"

代码

str_extract(Description, "\w+(\+)?(?= +years(\s+of)?(\s+programming|experience)?\b)")
[1] "5"    "3"    "more" "2+" 

我们可以使用命名向量将english元素替换为数字,然后进行提取

library(stringr)
library(english)
as.numeric(str_replace(str_replace_all(Description, 
setNames(as.character(1:9), as.character(english(1:9)))), 
".*\b([0-9]+)\b[^0-9]+\byears.*", "\1"))

-输出

[1] 5 3 2 2 2

最新更新