为什么我无法使用我的 Nokogiri 配件到达此桌子?

  • 本文关键字:Nokogiri html ruby nokogiri
  • 更新时间 :
  • 英文 :


这是Chrome中"inspect element"的XPATH:

//*[@id="configparse_port_list"]

这是我用来访问表格的Nokogiri CSS选择器:

doc.css("#configparse_port_list")

但我得到的只是一个空数组。

我做错了什么?

这也不起作用:

doc.css('table[@id="configparse_port_list"]')

.HTML:

<!DOCTYPE html>
<html>
<head>
  <title>SIAM</title>
  <link href="/assets/application-49cce08127ac99204d4cb59e3bfaab8e.css" media="all" rel="stylesheet" type="text/css" />
  <script src="/assets/application-50259c7e8f6a002b7166ab714e68857b.js" type="text/javascript"></script>
  <script src="/assets/controllers/configparse_ports-925b92a6e41f7ffc3014e351d29291fc.js" type="text/javascript"></script>
  <meta content="authenticity_token" name="csrf-param" />
<meta content="FFh3mbfqnLZhWclBmQ/kEeYSJPeQvapaC0tK9f4wWH8=" name="csrf-token" />
</head>
<body class="configparse_ports_index ctrl_configparse_ports" data-controller="configparse_ports" data-action="index">
    <div id="header">
        <a href="https://siam-pro.qa.domain.com/"><img alt="domain_logo" src="/assets/domain_logo-0e44a80f1d9f1f9ce8fb7aa35dbc008b.png" /></a>
        <div>
            <div class="product_name">SIAM</div>
            <div class="version">v5.1</div>
        </div>
        <form accept-charset="UTF-8" action="/search/quick.json" class="ignoreDirty" data-remote="true" id="quick_search" method="post"><div style="margin:0;padding:0;display:inline    "><input name="utf8" type="hidden" value="&#x2713;" /><input name="authenticity_token" type="hidden" value="FFh3mbfqnLZhWclBmQ/kEeYSJPeQvapaC0tK9f4wWH8=" /></div>
            <input id="search_testcases" name="search[testcases]" type="hidden" value="true" />
            <input id="search_testplans" name="search[testplans]" type="hidden" value="true" />
            <input id="search_component_names" name="search[component_names]" type="hidden" value="true" />
            <input autocomplete="off" id="search_term" name="search[term]" placeholder="Search" type="text" />
</form>
        <ul class="menu">
            <li><a href="https://siam-pro.qa.domain.com/">Home</a></li>
                <li><a href="/settings">Settings</a></li>
        </ul>
    </div>
    <div id="wrapper">
        <div id="content">
            <div id="loading">Loading ...</div>
            <div id="flash">

            </div>
            <div id="warning_message"></div>
            <h1>Listing Configparse Ports</h1>
<div id="configparse_port_filters" class="filter_wrap">
    <h4>Filter &nbsp;</h4>
</div>
<table id="configparse_port_list">
    <thead>
        <tr>
            <th>ID #</th>
            <th>Name</th>
            <th>ANI Release</th>
            <th>Network Configuration</th>
            <th>State</th>
        </tr>
    </thead>
    <tbody>
            <tr>
#MANY TRS - one of which I'm looking for based on the 3rd td (ANI Release)
            </tr>
    </tbody>
</table>

        </div>
    </div>
    <div id="sidebar">
        <h3>Testcases</h3>
        <ul>
            <li><a href="/testcases/new">New</a></li>
            <li><a href="/search/testcase/new">Search</a></li>
            <li><a href="/search/bugzilla_cr/new">Import RTC</a></li>
        </ul>
        <h3>Testplans</h3>
        <ul>
            <li><a href="/testplans/new">New</a></li>
            <li><a href="/search/testplan/new">Search</a></li>
            <li><a href="/testplans">List Active</a></li>
        </ul>
        <h3>Use Cases</h3>
        <ul>
            <li><a href="/use_cases/new">New</a></li>
            <li><a href="/search/use_case/new">Search</a></li>
            <li><a href="/use_cases/manage">Manage</a></li>
        </ul>
        <h3>Configparse</h3>
        <ul>
            <li><a href="/configparse_ports/new">New</a></li>
            <li><a href="/configparse_ports">List Ports</a></li>
        </ul>
        <h3>Automation</h3>
        <ul>
            <li><a href="/automation_suites/new">New</a></li>
            <li><a href="/search/automation_suite/new">Search</a></li>
            <li><a href="/automation/status">Status</a></li>
        </ul>
    </div>
    <div id="footer">
        <div>
            <ul class="menu">
                <li><a href="mailto:siam-help@domain.com">Email SIAM Support</a></li>
                <li><a href="http://agora.domain.com/wiki/SIAM">SIAM WIKI</a></li>
            </ul>
            <div class="copyright">&copy; 2012 domain Technologies</div>
        </div>
    </div>
    <script id="quick_search_results_template" type="text/html">
<div>
    {{#resources}}
    <div class="search_result search_result_{{internal_name}}">
        <h4>{{name}}</h4>
        {{#count}}
        <table>
            <thead>
                <tr>
                    <th>ID</th>
                    <th></th>
                </tr>
            </thead>
            <tbody>
                {{#results}}
                <tr class="search_result_{{id}}">
                    <td><a href="{{url}}">{{id}}</a></td>
                    <td class="search_result_name"><a title="{{name}}" href="{{url}}">{{name}}</a></td>
                </tr>
                {{/results}}
            </tbody>
        </table>
        <a class='more_results' href="{{search_url}}">More results</a>
        {{/count}}
        {{^results}}
        <div class='no_results'>No matches found</div>
        {{/results}}
    </div>
    {{/resources}}
</div>
</script>
    <script type="text/html" id="warning_message_template">
    <div class="ui-widget" id="warning_message">
    <div class="ui-state-highlight ui-corner-all">
        <span class="ui-icon ui-icon-info"></span>
        <p>{{message}}</p>
    </div>
</div>
</script>

    <!-- notification template -->
    <div id="notifcation-container" style="display:none">
        <div id="basic-template">
            <a class="ui-notify-cross ui-notify-close" href="#">x</a>
            <h1>#{title}</h1>
            <p>#{text}</p>
        </div>
    </div>
</body>
</html>    

使用以下代码,我找不到id="configparse_port_list"参数:

require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<!DOCTYPE html>
<html>
<head>
  <title>SIAM</title>
  <link href="/assets/application-49cce08127ac99204d4cb59e3bfaab8e.css" media="all" rel="stylesheet" type="text/css" />
  <script src="/assets/application-50259c7e8f6a002b7166ab714e68857b.js" type="text/javascript"></script>
  <script src="/assets/controllers/configparse_ports-925b92a6e41f7ffc3014e351d29291fc.js" type="text/javascript"></script>
  <meta content="authenticity_token" name="csrf-param" />
  <meta content="FFh3mbfqnLZhWclBmQ/kEeYSJPeQvapaC0tK9f4wWH8=" name="csrf-token" />
</head>
<body class="configparse_ports_index ctrl_configparse_ports" data-controller="configparse_ports" data-action="index">
<div id="header">
  <a href="https://siam-pro.qa.domain.com/"><img alt="domain_logo" src="/assets/domain_logo-0e44a80f1d9f1f9ce8fb7aa35dbc008b.png" /></a>
  <div>
    <div class="product_name">SIAM</div>
    <div class="version">v5.1</div>
  </div>
  <form accept-charset="UTF-8" action="/search/quick.json" class="ignoreDirty" data-remote="true" id="quick_search" method="post"><div style="margin:0;padding:0;display:inline    "><input name="utf8" type="hidden" value="&#x2713;" /><input name="authenticity_token" type="hidden" value="FFh3mbfqnLZhWclBmQ/kEeYSJPeQvapaC0tK9f4wWH8=" /></div>
    <input id="search_testcases" name="search[testcases]" type="hidden" value="true" />
    <input id="search_testplans" name="search[testplans]" type="hidden" value="true" />
    <input id="search_component_names" name="search[component_names]" type="hidden" value="true" />
    <input autocomplete="off" id="search_term" name="search[term]" placeholder="Search" type="text" />
  </form>
  <ul class="menu">
    <li><a href="https://siam-pro.qa.domain.com/">Home</a></li>
    <li><a href="/settings">Settings</a></li>
  </ul>
</div>
<div id="wrapper">
  <div id="content">
    <div id="loading">Loading ...</div>
    <div id="flash">

    </div>
    <div id="warning_message"></div>
    <h1>Listing Configparse Ports</h1>
    <div id="configparse_port_filters" class="filter_wrap">
      <h4>Filter &nbsp;</h4>
    </div>
    <table id="configparse_port_list">
      <thead>
        <tr>
          <th>ID #</th>
          <th>Name</th>
          <th>ANI Release</th>
          <th>Network Configuration</th>
          <th>State</th>
        </tr>
      </thead>
      <tbody>
      <tr>
        #MANY TRS - one of which I'm looking for based on the 3rd td (ANI Release)
      </tr>
      </tbody>
    </table>

  </div>
</div>
<div id="sidebar">
  <h3>Testcases</h3>
  <ul>
    <li><a href="/testcases/new">New</a></li>
    <li><a href="/search/testcase/new">Search</a></li>
    <li><a href="/search/bugzilla_cr/new">Import RTC</a></li>
  </ul>
  <h3>Testplans</h3>
  <ul>
    <li><a href="/testplans/new">New</a></li>
    <li><a href="/search/testplan/new">Search</a></li>
    <li><a href="/testplans">List Active</a></li>
  </ul>
  <h3>Use Cases</h3>
  <ul>
    <li><a href="/use_cases/new">New</a></li>
    <li><a href="/search/use_case/new">Search</a></li>
    <li><a href="/use_cases/manage">Manage</a></li>
  </ul>
  <h3>Configparse</h3>
  <ul>
    <li><a href="/configparse_ports/new">New</a></li>
    <li><a href="/configparse_ports">List Ports</a></li>
  </ul>
  <h3>Automation</h3>
  <ul>
    <li><a href="/automation_suites/new">New</a></li>
    <li><a href="/search/automation_suite/new">Search</a></li>
    <li><a href="/automation/status">Status</a></li>
  </ul>
</div>
<div id="footer">
  <div>
    <ul class="menu">
      <li><a href="mailto:siam-help@domain.com">Email SIAM Support</a></li>
      <li><a href="http://agora.domain.com/wiki/SIAM">SIAM WIKI</a></li>
    </ul>
    <div class="copyright">&copy; 2012 domain Technologies</div>
  </div>
</div>
<script id="quick_search_results_template" type="text/html">
<div>
        {{#resources}}
        <div class="search_result search_result_{{internal_name}}">
            <h4>{{name}}</h4>
            {{#count}}
            <table>
                <thead>
                    <tr>
                        <th>ID</th>
                        <th></th>
                    </tr>
                </thead>
                <tbody>
                    {{#results}}
                    <tr class="search_result_{{id}}">
                        <td><a href="{{url}}">{{id}}</a></td>
                        <td class="search_result_name"><a title="{{name}}" href="{{url}}">{{name}}</a></td>
                    </tr>
                    {{/results}}
                </tbody>
            </table>
            <a class='more_results' href="{{search_url}}">More results</a>
            {{/count}}
            {{^results}}
            <div class='no_results'>No matches found</div>
            {{/results}}
        </div>
        {{/resources}}
    </div>
</script>
<script type="text/html" id="warning_message_template">
<div class="ui-widget" id="warning_message">
        <div class="ui-state-highlight ui-corner-all">
            <span class="ui-icon ui-icon-info"></span>
            <p>{{message}}</p>
        </div>
    </div>
</script>

<!-- notification template -->
<div id="notifcation-container" style="display:none">
  <div id="basic-template">
    <a class="ui-notify-cross ui-notify-close" href="#">x</a>
    <h1>title</h1>
    <p>text</p>
  </div>
</div>
</body>
</html>    
EOT

运行后,HTML 被解析并准备就绪:

configparse_port_list = doc.at('#configparse_port_list')
configparse_port_list.to_html
# => "<table id="configparse_port_list">n<thead><tr>n<th>ID #</th>n          <th>Name</th>n          <th>ANI Release</th>n          <th>Network Configuration</th>n          <th>State</th>n        </tr></thead>n<tbody><tr>n        #MANY TRS - one of which I'm looking for based on the 3rd td (ANI Release)n      </tr></tbody>n</table>"

我会小心做的一件事:

doc.css("#configparse_port_list")

是一个矛盾。 css用于返回满足特定条件的所有节点。 #configparse_port_list在文档中只能存在一次,因为它是一个 ID。 Nokogiri 很乐意为css返回单个元素,但对于不注意代码的其他人来说,这可能会令人困惑。我建议将其编写为 at("#configparse_port_list"),因为at将返回单个元素,使其与只有一个 ID 与之匹配的事实保持同步。

configparse_port_list = doc.css('#configparse_port_list').class # => Nokogiri::XML::NodeSet
configparse_port_list = doc.css('#configparse_port_list').size # => 1

这些也有效,只需注意前面关于css和单个元素的警告:

doc.css('table[@id="configparse_port_list"]').size # => 1
doc.css('table#configparse_port_list').size # => 1

您可能需要检查您的 Nokogiri 和 libXML2 环境是否是最新的:

nokogiri -v

目前的野木是1.6.0

请注意,Nokogiri 对文档不满意:

doc.errors
# => [#<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>,
#     #<Nokogiri::XML::SyntaxError: Element script embeds close tag>]

我被困在pubcookie身份验证服务器后面。一旦我通过了身份验证,我就可以按照我最初尝试的方式访问 html 表(尽管在通过 id 获取节点时使用 .at 更可取)。

相关内容

  • 没有找到相关文章

最新更新