更改HTML文件的样式字段中的URI

我有一个HTML文件：

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">
  td[id="module1header"] {
        width: 310px !important;
        height: 75px !important;
        margin-top: 25px;
        text-align: center !important;
        font-size: 21px !important;
        vertical-align: bottom !important;
    }
    td[class="module1"] {
        text-align:center;
        padding:0 0 0 20px !important;
        width:335px;
    }
    span[id="modulediv1"] {
        width: 370px !important;
        height: 379px !important;
        background-image: url(http://www.url.com/somefile.jpg) !important;
    }
</style>
<html>
<head>

使用Nokogiri，我想访问<style type="text/css"> </style>中的每个URI并对其进行更改。我在尝试这样的东西：

htext= File.open(inputOpts.html_file).read
h_docc = Nokogiri::HTML(htext)
h_doc.css('td[style]').each do |style|
//modify uri hear.
end

但该代码无法访问样式。如何访问样式字段中的每个URI，然后更改它？

h_doc.css('td[style]')是一种CSS匹配方法，在本例中，它将td与style HTML属性进行匹配：

>> Nokogiri::HTML('<style></style><td style></td>').css('td[style]').to_s
#=> "<td style></td>"

为了匹配style标签，您必须使用style标签选择器：

>> Nokogiri::HTML('<style></style><td style></td>').css('style').to_s
#=> "<style></style>n"

您正在尝试使用解析HTML的Nokogiri来解析CSS。你不能从这里到那里。

相反，您可以使用Nokogiri来找到合适的CSS块，然后使用常规的String munging工具进行操作：

require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css">
  td[id="module1header"] {
        width: 310px !important;
        height: 75px !important;
        margin-top: 25px;
        text-align: center !important;
        font-size: 21px !important;
        vertical-align: bottom !important;
    }
    td[class="module1"] {
        text-align:center;
        padding:0 0 0 20px !important;
        width:335px;
    }
    span[id="modulediv1"] {
        width: 370px !important;
        height: 379px !important;
        background-image: url(http://www.url.com/somefile.jpg) !important;
    }
</style>
<html>
<head>
EOT

在这一点上，doc是一个被解析并准备好操作的DOM：

new_url = 'http://www.example.com/index.html'
doc.search('style').each do |style|
  style.content = style.content.sub(/burl([^)]+)/, "url(#{ new_url })")
end
puts doc.to_html

运行该输出：

# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html>
# >> <head>
# >> <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
# >> <style type="text/css">
# >> 
# >>   td[id="module1header"] {
# >>         width: 310px !important;
# >>         height: 75px !important;
# >>         margin-top: 25px;
# >>         text-align: center !important;
# >>         font-size: 21px !important;
# >>         vertical-align: bottom !important;
# >>     }
# >>     td[class="module1"] {
# >>         text-align:center;
# >>         padding:0 0 0 20px !important;
# >>         width:335px;
# >>     }
# >>     span[id="modulediv1"] {
# >>         width: 370px !important;
# >>         height: 379px !important;
# >>         background-image: url(http://www.example.com/index.html) !important;
# >>     }
# >> </style>
# >> 
# >> 
# >> </head>
# >> </html>

请注意，嵌入的URL现在已更新。

doc.search('style')返回在文档中找到的所有<style>标记。如果您想要一个特定的标记，您需要将'style'修改为更合适的CSS或XPath选择器，或者想办法确定是否继续使用该特定标记。

CCD_ 11返回标签的内容。这不一定是标签的文本；<style>标记内的CSS样式表根据定义是文本。style.content =是我们如何将内容分配给特定标签的，这样代码就可以获取现有的CSS样式信息，对其执行sub，并将其作为标签的内容重新分配给标签。

CCD_ 15是一个正则表达式，用于查找独立的CCD_；CCD_ 17是一个单词边界。([^)]+)的意思是"找到一个左括号，里面的所有文本，直到下一个右括号和那个右括号。"这有助于我们隔离要替换的文本块。剩下的应该很容易弄清楚。

相关内容

最新更新

热门标签：