如何在一个字符串中找到多个子字符串匹配项，请更改子字符串外壳

我正在尝试用ruby解析一个HTML字符串，这个字符串包含多个<pre></pre>标记，我需要找到并编码这些元素之间的所有<和>括号。

Example: 
string_1_pre = "<pre><h1>Welcome</h1></pre>"
string_2_pre = "<pre><h1>Welcome</h1></pre><pre><h1>Goodbye</h1></pre>"
def clean_pre_code(html_string)
matched = html_string.match(/(?<=<pre>).*(?=</pre>)/)
cleaned = matched.to_s.gsub(/[<]/, "&lt;").gsub(/[>]/, "&gt;")
html_string.gsub(/(?<=<pre>).*(?=</pre>)/, cleaned)
end
clean_pre_code(string_1_pre) #=> "<pre>&lt;h1&gt;Welcome&lt;/h1&gt;</pre>"
clean_pre_code(string_2_pre) #=> "<pre>&lt;h1&gt;Welcome&lt;/h1&gt;&lt;/pre&gt;&lt;pre&gt;&lt;h1&gt;Goodbye&lt;/h1&gt;</pre>"

只要html_string只包含一个<pre></pre>元素，就可以工作，但如果有多个元素，就不能工作。

我对使用Nokogiri或类似的解决方案持开放态度，但不知道如何让它做我想做的事。

如果您需要任何其他上下文，请告诉我。

更新：这是可能的，只有野村，见公认的答案。

@zstrad44是的，你可以通过使用Nokogiri来完成。这是我从您的版本开发的代码版本，这将为您提供字符串中多个pre标记所需的结果。

def clean_pre_code(html_string)
doc = Nokogiri::HTML(html_string)
all_pre = doc.xpath('//pre')
res = ""
all_pre.each do |pre|
pre = pre.to_html
matched = pre.match(/(?<=<pre>).*(?=</pre>)/)
cleaned = matched.to_s.gsub(/[<]/, "&lt;").gsub(/[>]/, "&gt;")
res += pre.gsub(/(?<=<pre>).*(?=</pre>)/, cleaned)
end
res
end

我建议你阅读Nokogiri Cheatsheet，以便更好地理解我在代码中使用的方法。编码快乐！希望我能帮助

相关内容

最新更新

热门标签：