我正在学习xpath并尝试使用Python lxml/html获取具有特定节点属性的节点的值,例如(Google Playstore)。从下面的代码中,我想从节点"a"获取开发人员电子邮件值,属性"href"以"mailto:"开头。我的 python 代码片段返回应用程序名称,但返回空的开发人员电子邮件。谢谢
<html>
<div class="id-app-title" tabindex="0">Candy Crush Saga</div>
<div class="meta-info meta-info-wide">
<div class="title"> Developer </div>
<a class="dev-link" href="https://www.google.com/url?q=http://candycrush.com" rel="nofollow" target="_blank"> Visit website </a>
<a class="dev-link" href="mailto:candycrush@kingping.com"
rel="nofollow" target="_blank">candycrush@kingping.com </a> ##Interesting part here
</div>
</html>
蟒蛇代码 (2.7)
def get_app_from_link(self,link):
start_page=requests.get(link)
#print start_page.text
tree = html.fromstring(start_page.text)
name = tree.xpath('//div[@class="id-app-title"]/text()')[0]
#developer=tree.xpath('//div[@class="dev-link"]//*/div/@href')
developer=tree.xpath('//div[contains(@href,"mailto") and @class="dev-link"]/text()')
print name,developer
return
现在你使用的是标签div
,而不是a
:
'//a[contains(@href,"mailto") and @class="dev-link"]/text()'
此外,您的函数不返回项。使用如下return
:
def get_app_from_link(self,link)::
# your code
return name, developer