多个OR表达式(如(a|b|c))的XPath问题



我简化了html:

<html>
<main>
<span>one</span>
</main>
<not_important>
<div>skip_me</div>
</not_important>
<support>
<div>two</div>
</support>
</html>

我只想找到onetwo,条件是父标签是mainsupport,后面有spandiv

我想知道为什么那个代码不起作用:

import lxml.html as HTML_PARSER
html = """
<html>
<main>
<span>one</span>
</main>
<not_important>
<div>skip_me</div>
</not_important>
<support>
<div>two</div>
</support>
</html>
"""
parent = '//main | //support'
child = '/span | /div'
doc = HTML_PARSER.fromstring(html)
print doc
xpath = '(%s)(%s)' % (parent, child)
print xpath
parsed = doc.xpath(xpath)
print parsed

我得到一个错误Invalid expression。为什么?

这个(//main | //support)和这个(/span | /div)扩展都是正确的。像(//main | //support)/span这样的简单组合也是正确的。但为什么更复杂的组合(//main | //support)(/span | /div)不正确呢?如何解决?

在我的真实案例中,//main//support/span/div是非常复杂的扩展,我想要一些像(xpath1 | xpath2)(xpath3 | xpath4)这样的通用解决方案

这会找到它,但我不能100%确定它是否是你想要的:

//*[name() = 'main' or name() = 'support']/*[name() = 'span' or name() = 'div']/text()

您的XPath对XPath版本1(lxml使用的版本(无效

尝试

xpath = '//div[parent::support]|//span[parent::main]'

parent = ['main', 'support']
child = ['span', 'div']
xpath = '//*[self::{0[0]} or self::{0[1]}]/*[self::{1[0]} or self::{1[1]}]'.format(parent, child)

您可以使用self::轴:

(//main | //support)[*[self::div or self::span]]

最新更新