我有一个字符串列表和一个正则表达式基序列表,我想在R中匹配它们。如果有匹配,我想看看每个字符到底匹配了什么。例如,字符串TAPQQATD
和基序"P.Q.{2}D"
可以与str_match匹配,但它只产生输出:
> str_match('TAPQQATD', "P.Q.{2}D")
[,1]
[1,] "PQQATD"
现在,我知道我可以编辑每个基序,以包含每个字符周围的捕获组(如"(P)(.)(Q)(.{2})(D)"
(,但我不希望这样做,因为它们的数量。那么我能在R中产生这样的东西吗(可能有其他函数(,但用"P.Q.{2}D"
的表达式?
> str_match('TAPQQATD', "(P)(.)(Q)(.{2})(D)")
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "PQQATD" "P" "Q" "Q" "AT" "D"
谢谢!
您可以尝试使用gsub
添加括号。
stringr::str_match('TAPQQATD',
gsub("(.\{\d+?\}|.)", "(\1)", "P.Q.{2}D", perl=TRUE))
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] "PQQATD" "P" "Q" "Q" "AT" "D"
我们可以使用stringr
库中的str_match_all
:
x <- "TAPQQATD"
str_match_all(x, "(P)(.)(Q)(.{2})(D)")
[[1]]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "PQQATD" "P" "Q" "Q" "AT" "D"
或在基R中,regmatches
:
regmatches(x, gregexpr("(P)(.)(Q)(.{2})(D)", x))