如何添加商标符号

我正在尝试为HTML文档中"Imagination Playground"的所有实例添加一个商标符号。但是我最终得到了这样的东西：

&lt;i class="fa fa-trademark"&gt;&lt;/i&gt;

似乎我使用的符号已转换为HTML字符。我怎样才能逃脱呢？

这是我的原始 Ruby 代码：

body = "<p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground , we've got webinars for you in March!</p>
  <p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>"
new_body = Nokogiri::HTML(body)
new_body.encoding = 'UTF-8'
new_body.css('p','a').each{ |p|
p.content =  p.content.gsub(/Imagination Playgrounds/,'Imagination Playground<i class="fa fa-trademark"></i>');
puts new_body

这就是我得到的：

<p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground&lt;i class="fa fa-trademark"&gt;&lt;/i&gt;, we've got webinars for you in March!</p>
<p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>

如何替换该 HTML 段落并转义与号和特殊字符？

这是我

的做法：

require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground , we've got webinars for you in March!</p>
<p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>
EOT
doc.encoding = 'UTF-8'
doc.css('p').each do |p|
  p.children = p.content.gsub(/Imagination Playgrounds/, 'Imagination Playground<i class="fa fa-trademark"></i>')
end
puts doc

这导致：

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >> <p>Whether you want to build a playground, make play a priority in your community, or learn more about Imagination Playground<i class="fa fa-trademark"></i>, we've got webinars for you in March!</p>
# >> <p>As always, all our webinars are FREE. All you need to participate is a phone and a computer with an Internet connection.</p>
# >> </body></html>

Nokogiri非常聪明。当它看到children=时，它会查看它是否正在接收字符串。如果是这样，它会解析该字符串并将其转换为 Node，然后将现有子节点替换为新节点。这与使用Nokogiri知道应该是文本的content=有很大的不同，然后将嵌入的标签编码为<等。这在文档中有所介绍。

对于children= ：

设置此节点的内部 html node_or_tags node_or_tags可以是 Nokogiri：：XML：：Node、Nokogiri：：XML：:D ocumentFragment 或包含标记的字符串。

对于content=：

将节点的内容设置为包含字符串的文本节点。字符串被 XML 转义，而不是解释为标记。

如果我想保留段落内的 HTML 标签，这将不起作用，请尝试这样做<p>fsome test and then <b>bold</b></p>

您正在更改要求。别这样。具体说明您的需求，以便我们回答一次真正的问题。

需要稍作改动才能获取所需标签的内容。使用 children.to_html 获取嵌入式节点的 HTML 字符串，然后gsub它并使用其结果：

require 'nokogiri'
doc = Nokogiri::HTML('<p>Imagination Playground<b>foo</b></p>')
puts doc.to_html

开始看起来像这样：

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>Imagination Playground<b>foo</b></p></body></html>

修改 DOM：

doc.search('p').each do |p|
  p.children = p.children.to_html.gsub(/Imagination Playgrounds?/, 'Imagination Playground<i class="fa fa-trademark"></i>')
end
puts doc

现在看起来像：

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body><p>Imagination Playground<i class="fa fa-trademark"></i><b>foo</b></p></body></html>

请注意，我使用的是search而不是css。使用通用方法而不是更具体的方法。如果需要，它可以更轻松地切换到 XPaths。

此外，我在gsub中使用了更智能的模式来有条件地获取单个尾随空格(如果可用(。使用 HTML 执行此操作不是必需的，因为浏览器会吞噬空白，但如果您正在处理常规文本文档或预先格式化的文本，这将是正确的方法。

而且，只是为了更详细地了解Nokogiri所看到的内容：

doc.search('p').first 
# => #(Element:0x3fd222462204 {
#      name = "p",
#      children = [
#        #(Text "Imagination Playground"),
#        #(Element:0x3fd2224608f0 { name = "b", children = [ #(Text "foo")] })]
#      })
doc.search('p').first.children 
# => [#<Nokogiri::XML::Text:0x3fd222461688 "Imagination Playground">, #<Nokogiri::XML::Element:0x3fd2224608f0 name="b" children=[#<Nokogiri::XML::Text:0x3fd22245fe64 "foo">]>]

相关内容

最新更新

热门标签：