Python通过xml抓取打印空括号



我试图通过lxml从网站提取几个字符,到树,然后xpath。我已经尝试使用谷歌浏览器来获得正确的xpath,但它打印空括号。

    #imports
    from lxml import html
    import requests
    #get magicseaweed Scripps report
    msScrippsPage = requests.get("""http://magicseaweed.com/Scripps-Pier-
    La-Jolla-Surf-Report/296/.html""")
    #make tree from site
    msScrippsTree = html.fromstring(msScrippsPage.content)
    #get wave size
    msScrippsWave = msScrippsTree.xpath("""/html/body/div[2]/div[5]/div/div[1]/div[2]/div[2]/div/div[2]/div[1]/div/div[1]/div/div/div/div/div[1]/div/div[2]/ul[1]/li[1]/text()""")
    print 'ms SCripps: ', msScrippsWave

到终端的输出是'msScripps: []'

你不应该在url中使用换行符。当您使用一行时,您的xpath工作

msScrippsPage = requests.get("""http://magicseaweed.com/Scripps-Pier-La-Jolla-Surf-Report/296/.html""")
print msScrippsPage.content
['    0.4-0.6', '   ']
########################################
url = """http://magicseaweed.com/Scripps-Pier-
La-Jolla-Surf-Report/296/.html"""
print url
'http://magicseaweed.com/Scripps-Pier-n    La-Jolla-Surf-Report/296/.html'

编辑:添加完整示例

from lxml import html
import requests
msScrippsPage = requests.get("""http://magicseaweed.com/Scripps-Pier-La-Jolla-Surf-Report/296/.html""")
msScrippsTree = html.fromstring(msScrippsPage.content)
msScrippsWave = msScrippsTree.xpath("""/html/body/div[2]/div[5]/div/div[1]/div[2]/div[2]/div/div[2]/div[1]/div/div[1]/div/div/div/div/div[1]/div/div[2]/ul[1]/li[1]/text()""")
print 'ms SCripps: ', msScrippsWave

最新更新