由于某种原因,我无法使用 xpath 访问数字和标题。
这是 HTML:
<div class="style-scope classification-tree">
<state-modifier class="code style-scope classification-tree" act="{"type": "QUERY_ADD_CPC", "cpc": "$cpc"}" first="true" data-cpc="C07C311/51">
<a id="link" href="/?q=C07C311%2f51" class="style-scope state-modifier">C07C311/51</a>
</state-modifier>
<span class="description style-scope classification-tree">Y being a hydrogen or a carbon atom</span>
</div>
到目前为止,我已经尝试了以下代码:
Class_Content_year = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "//state-modifier[@class='code style-scope classification-tree']//a[contains(@id, 'link') and contains(@class, 'style-scope state-modifier')]"))).text
Class_Content_title = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree']//span[contains(@class, 'description style-scope classification-tree')]"))).text
它应该得到里面的文本和.
但是,会发生此错误:
Traceback (most recent call last):
File "<ipython-input-2-dfe4f1a9b070>", line 97, in openURL
Class_Content = Class(driver, Current_Content)
File "c:UsersjygDesktopMTExtract_data_2.py", line 57, in Class
Class_Content_year = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree']//state-modifier[contains(@class, 'code style-scope classification-tree']/child::a[contains(@id, 'link') and contains(@class, 'style-scope state-modifier')]"))).text
File "C:UsersjygAppDataLocalProgramsPythonPython37-32libsite-packagesseleniumwebdriversupportwait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
有人可以帮忙吗?谢谢!
这是要使用的 xpath。
法典:
Class_Content_year = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree']//state-modifier[contains(@class, 'code style-scope classification-tree')]//a[contains(@id, 'link') and contains(@class, 'style-scope state-modifier')]")))
# now get the text
print(Class_Content_year)
# now get the text from span
print(driver.find_element_by_xpath("//div[@class='style-scope classification-tree']//span[@class='description style-scope classification-tree']").text)
以下是其他可能的 xpath:
//div[@class='style-scope classification-tree']//a[@class='style-scope state-modifier']
对于跨度,您可以使用以下 xpath。
//div[@class='style-scope classification-tree']//span[@class='description style-scope classification-tree']
要提取文本 C07C311/51 而不是使用presence_of_element_located()
您需要使用 visibility_of_element_located()
并且可以使用以下任一定位器策略:
-
使用
XPATH
:driver.get("https://patents.google.com/patent/JP2009517369A/en?oq=JP2009517369]") print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='style-scope classification-tree' and not(@hidden)]/state-modifier[@class='code style-scope classification-tree']/a[@class='style-scope state-modifier']"))).get_attribute("innerHTML"))
-
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC