我想在一个网站(https://www.harris.com/careers/jobs)内刮一个框架。列出的第一个职位的位置的Xpath是
/html/body/center/table[2]/tbody/tr/td/form/table[3]/tbody/tr[3]/td/table/tbody/tr[3]/td[4]/span
我试图在Python中使用lxml库提取跨度内的文本。我的代码目前如下
from lxml import html
import requests
page = requests.get('https://www.harris.com/careers/jobs')
tree = html.fromstring(page.content)
location = tree.xpath('/html/body/center/table[2]/tbody/tr/td/form/table[3]/tbody/tr[3]/td/table/tbody/tr[3]/td[4]/span/text()')
不幸的是命令
print(test)
产生如下
[]
我很确定Xpath有问题,可以对它进行改进以提取我需要的文本。
这里我将给出工作代码:
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
import time
driver=webdriver.Chrome('./chromedriver.exe')
try:
driver.get("https://www.harris.com/careers/jobs")
driver.switch_to.frame("frmJobs");
time.sleep(5)
#s = driver.find_element_by_id("searchbuttonBtn__a")
s = driver.find_element_by_xpath("//input[@class='submitbutton']")
driver.execute_script("return arguments[0].scrollIntoView();",s)
print s.get_attribute("value")
s.send_keys("n")
time.sleep(10)
for a in driver.find_elements_by_xpath("//td[@class='listheadingbackground']/table/tbody/tr/td[2]/span/a"):
print a.get_attribute("href")
except Exception as e:
print e
driver.quit()