>假设我有一个文本,例如
"tnheitanhiaiin [ hello, there, will, you, help ] thitnahioetnaeitn
tnhetnh [ me, figure, this, out ] ihnteahntanitnh
nhoietnaiotniaehntehtnea [ please, because, i, dont, know ] thnthen
"
如何捕获括号内的每个单词,以便用单引号将它们括起来?
我尝试了[s?(?:(w*),?s?)+]
,尽管它与括号中的部分匹配,但它似乎无法捕获任何东西。
括号内的单词可以是任何内容。
我希望在每一行都使用 gsub。
r = /
(?<=[ ]) # match a space in a positive lookbehind
p{L}+ # match one or more letters
(?= # begin a positive lookahead
[^[]+? # match one or more characters other than a left bracket, lazily
] # match a right bracket
) # end the positive lookahead
/x # free-spacing regex definition mode
让str
成为问题中定义的字符串,我们可以用单引号将括号之间的单词括起来,如下所示。
str.gsub(r) { |s| "'#{s}'" }
#=> "tnheitanhiaiin [ 'hello', 'there', 'will', 'you', 'help' ]
# thitnahioetnaeitnntnhetnh [ 'me', 'figure', 'this', 'out' ]
# ihnteahntanitnhnnhoietnaiotniaehntehtnea [ 'please', 'because',
# 'i', 'dont', 'know' ] thnthenn"
相反,如果我们希望提取这些单词,我们将使用 String#scan。
str.scan(r)
#=> ["hello", "there", "will", "you", "help", "me", "figure", "this",
# "out", "please", "because", "i", "dont", "know"]
[^[]+?
末尾的问号(懒惰匹配,而不是贪婪匹配(是为了提高效率,但不是必需的。
我使用自由间距定义模式使正则表达式自我记录。按照惯例,它会写成如下。
/(?<= )p{L}+(?=[^[]+?])/
这假定(如示例中所示(括号匹配而不是嵌套,并且带括号的单词前面有一个空格,后跟一个逗号或空格。如果与括号内单词周围的字符有关的假设不正确,则可以调整正则表达式。
你可以试试这个:
original = "tnheitanhiaiin [ hello, there, will, you, help ] thitnahioetnaeitnntnhetnh [ me, figure, this, out ] ihnteahntanitnhnnhoietnaiotniaehntehtnea [ please, because, i, dont, know ] thnthenn"
clone = original
original.scan(/[(.*)]/).flatten.map { |elem| [elem, elem.gsub(/w+/) { |match| %Q('#{match}') }] }.each { |(pattern, replacement)| clone.gsub!(pattern, replacement) }
puts clone # =>
# tnheitanhiaiin [ 'hello', 'there', 'will', 'you', 'help' ] thitnahioetnaeitn
# tnhetnh [ 'me', 'figure', 'this', 'out' ] ihnteahntanitnh
# nhoietnaiotniaehntehtnea [ 'please', 'because', 'i', 'dont', 'know' ] thnthen
也许是以下行的双 gsub:
s = "tnheitanhiaiin [ hello, there, will, you, help ] thitnahioetnaeitnntnhetnh [ me, figure, this, out ] ihnteahntanitnhnnhoietnaiotniaehntehtnea [ please, because, i, dont, know ] thnthenn"
s.gsub(/[.*?]/) { |m| m.gsub(/w+/, '' '') }
#=> "tnheitanhiaiin [ 'hello', 'there', 'will', 'you', 'help' ] thitnahioetnaeitnntnhetnh [ 'me', 'figure', 'this', 'out' ] ihnteahntanitnhnnhoietnaiotniaehntehtnea [ 'please', 'because', 'i', 'dont', 'know' ] thnthenn"