R regex-如何列出字符串的每一部分,使其完全匹配



我有一个字符串列表和一个正则表达式基序列表,我想在R中匹配它们。如果有匹配,我想看看每个字符到底匹配了什么。例如,字符串TAPQQATD和基序"P.Q.{2}D"可以与str_match匹配,但它只产生输出:

> str_match('TAPQQATD', "P.Q.{2}D")
[,1]    
[1,] "PQQATD"

现在,我知道我可以编辑每个基序,以包含每个字符周围的捕获组(如"(P)(.)(Q)(.{2})(D)"(,但我不希望这样做,因为它们的数量。那么我能在R中产生这样的东西吗(可能有其他函数(,但用"P.Q.{2}D"的表达式?

> str_match('TAPQQATD', "(P)(.)(Q)(.{2})(D)")  
[,1]     [,2] [,3] [,4] [,5] [,6]  
[1,] "PQQATD" "P"  "Q"  "Q"  "AT" "D"  

谢谢!

您可以尝试使用gsub添加括号。

stringr::str_match('TAPQQATD',
gsub("(.\{\d+?\}|.)", "(\1)", "P.Q.{2}D", perl=TRUE))
#     [,1]     [,2] [,3] [,4] [,5] [,6]
#[1,] "PQQATD" "P"  "Q"  "Q"  "AT" "D"

我们可以使用stringr库中的str_match_all

x <- "TAPQQATD"
str_match_all(x, "(P)(.)(Q)(.{2})(D)")
[[1]]
[,1]     [,2] [,3] [,4] [,5] [,6]
[1,] "PQQATD" "P"  "Q"  "Q"  "AT" "D" 

或在基R中,regmatches:

regmatches(x, gregexpr("(P)(.)(Q)(.{2})(D)", x))

最新更新