使用 xpath 获取表中的最大值



我有一个通用文件格式的大型html菜单文件,我需要获取每个菜单项的最高价格。 这是菜单文件的一部分的示例:

### File Name: "menu" (All types ".") ###
</div>
     <div class="menu-item-prices">
       <table>
        <tr>
            <td class="menu-item-price-amount">
                10
            </td>
            <td class="menu-item-price-amount">
                14
            </td>
        </tr>
</div>
</div>
     <div class="menu-item-prices">
       <table>
        <tr>
            <td class="menu-item-price-amount">
                100
            </td>
            <td class="menu-item-price-amount">
                1
            </td>
        </tr>
</div>

我需要我的程序返回每个菜单项中最高价格的列表,即在本例中为 maxprices=['14','100']。 我已经在Python中尝试了以下代码:

#!/user/bin/python
from lxml import html
from os.path import join, dirname, realpath
from lxml.etree import XPath
def main():
    """ Drive function """
    fpath = join(dirname(realpath(__file__)), 'menu')
    hfile = open(fpath)  # open html file
    tree = html.fromstring(hfile.read())
    prices_path = XPath('//*[@class="menu-item-prices"]/table/tr')  
    maxprices = []
    for p in prices_path(tree):
        prices = p.xpath('//td/text()')
        prices = [el.strip() for el in prices]
        maxprice = max(prices)
        maxprices.append(maxprice)
        print maxprices
if __name__ == '__main__':
    main()

我也试过

prices = tree.xpath('//*[@class="menu-item-prices"]'
                    '//tr[not(../tr/td > td)]/text()')
prices = [el.strip() for el in prices]

而不是上述循环策略。两者都不返回每个类别所需的最高价格。如何修改我的代码以正确获取这些价格? 谢谢。

至少有 1 个问题 - 您正在比较字符串,但需要将价格转换为 float,然后获得每个表行的最大值。

完整示例:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from lxml.html import fromstring
data = """
<div>
     <div class="menu-item-prices">
       <table>
            <tr>
                <td class="menu-item-price-amount">
                    10
                </td>
                <td class="menu-item-price-amount">
                    14
                </td>
            </tr>
        </table>
    </div>
    <div class="menu-item-prices">
       <table>
        <tr>
            <td class="menu-item-price-amount">
                100
            </td>
            <td class="menu-item-price-amount">
                1
            </td>
        </tr>
        </table>
    </div>
</div>
"""
tree = fromstring(data)
for item in tree.xpath("//div[@class='menu-item-prices']/table/tr"):
    prices = [float(price.strip()) for price in item.xpath(".//td[@class='menu-item-price-amount']/text()")]
    print(max(prices))

指纹:

14.0
100.0

最新更新