通过"pattern"过滤/排除 xPath 提取

这是我必须使用的：

<div class="Pictures zoom">
<a title="Productname 1" class="zoomThumbActive" rel="{gallery: 'gallery1', smallimage: '/images/2.24198/little_one.jpeg', largeimage: '/images/76.24561/big-one-picture.jpeg'}" href="javascript:void(0)" style="border-width:inherit;">
<img title="Productname 1" src="/images/24.245/mini-doge-picture.jpeg" alt="" /></a>
<a title="Productname 1" rel="{gallery: 'gallery1', smallimage: '/images/2.24203/small_one.jpeg', largeimage: '/images/9.5664/very-big-one-picture.jpeg'}" href="javascript:void(0)" style="border-width:inherit;">
<img title="Productname 1" src="/images/22.999/this-picture-is-very-small.jpeg" alt="" /></a>
<div>

使用以下 Xpath：

/html//div[@class='Pictures zoom']/a/@rel

输出变为：

{gallery: 'gallery1', smallimage: '/images/2.24198/little_one.jpeg', largeimage: '/images/76.24561/big-one-picture.jpeg'}
{gallery: 'gallery1', smallimage: '/images/2.24203/small_one.jpeg', largeimage: '/images/9.5664/very-big-one-picture.jpeg'}

是否可以过滤提取，所以上面的意思，我只得到这些：

/images/76.24561/big-one-picture.jpeg
/images/9.5664/very-big-one-picture.jpeg

我只想保持largeimage: '和'}之间的一切

此致敬意

刘康

使用substring-before和substring-after来切割您不需要的零件。

使用 XPath

1.0，只能对单个结果执行此操作（因此您无法通过单个 XPath 调用获取一个文档中包含的所有 URL）。此查询将返回第一个 URL：

substring-before(substring-after((//@rel)[1], "largeimage: '"), "'")

XPath 2.0 允许您将函数作为轴步骤运行。此查询会将您要查找的所有 URL 作为单个令牌返回：

//@rel/substring-before(substring-after(., "largeimage: '"), "'")

相关内容

最新更新

热门标签：