正确的 XPath 查询是什么'select onchange'



我正在学习XPath,并希望提取嵌入在以下HTML中的URL。我试过@"//table[contains(@option, 'value')]"的变体,但没有成功。

<body>
<div id="Wrapper">
<div id="header">
<span id="logoHolder">
<a href="http://www.foo.com">
<img src="/templates/blank_j15/images/nexus_logo.png" width="167" height="65" border="0"/>
</a>
</span>
<span style="float: left; padding-top: 27px; padding-right: 0px; padding-bottom: 0px; padding-left: 0px; color: rgb(0, 182, 222); ">Embracing Diversity. Challenging Minds.</span>
<span id="searchHolder">
<div style="clear: both; "/>
<div id="IE_P_space"/>
<div id="arttotalmenucontent_138" class="hidden">
<script type="text/javascript">
<table cellspacing="0" cellpadding="0" border="0" width="100%" id="wrapper_cont_table">
<tbody>
<tr>
<tr>
<tr>
<td valign="top" id="wrapper_cont_leftNav">
<div class="leftnavCont">
<p>
<select onchange="nl(this.value)" size="8">
<option value="/images/download/newsletter/connect04_300911.pdf">Connect 04: 30/09/2011</option>
<option value="/images/download/newsletter/connect03_230911.pdf">Connect 03: 23/09/2011</option>
<option value="/images/download/newsletter/connect02_150911.pdf">Connect 02: 15/09/2011</option>
<option value="/images/download/newsletter/connect01_120911.pdf">Connect 01: 12/09/2011</option>
</p>
//p/select/option/@value

似乎对我有用。

我想一定是xpath库的使用有问题。我没花多长时间就找到了你样品的来源。

这是一个使用我的首选xml库的工作示例。

#!/usr/bin/env python
import os
from urllib2 import urlopen
from lxml import etree
filename = 'sample.html'
url = 'http://www.foo.example/index.php?option=com_content&view=article&id=186&Itemid=301'
# Some simple caching for a test script...
if os.path.exists(filename):
  with open(filename,'r') as f:
    data = f.read()
else:
  data  = urlopen(url).read()
  with open(filename,'w') as f:
    f.write(data)
doc = etree.HTML(data)
for v in doc.xpath('//p/select/option/@value'):
  print v

生产:

<>前/图片/下载/通讯/connect04_300911.pdf//下载/通讯/connect03_230911.pdf图像//下载/通讯/connect02_150911.pdf图像/图片/下载/通讯/connect01_120911.pdf

最新更新