我正试图从https://openaq.org/#/location/Algiers?_k=nv8w8w,但它总是返回一个null值。
def getCardDetails(country, url):
local_df = pd.DataFrame(columns=['country','card_url','general','country_link','city', 'PM2.5','date','hour'])
pm = None
date = None
hour = None
general = None
city = None
country_link = None
try:
#wait = WebDriverWait(driver, 3)
#wait.until(EC.presence_of_element_located((By.ID, 'location-fold-stats')))
time.sleep(2)
# Using Xpath we are getting the full text of the sibling that comes
# after the text containing "PM2.5". We will split the full text to
# generate variables for our Data Frame such as "pm", "date" & "hour".
try:
print("inn")
pm_date = driver.find_element(By.XPATH, '//dt[text() = "PM2.5"]/following-sibling::dd[1]').text
# Scraping pollution details from each location page
# and splitting them to save in the relevant variables
text = pm_date.split('µg/m³ at ')
print("nn",pm_date)
pm = float(text[0])
full_date = text[1].split(' ')
date = full_date[0]
hour = full_date[1]
这是我第一次在网络抓取中使用Selenium。我想知道XPath是如何工作的,这里的问题是什么。
您的XPATH
是正确的。要从动态元素中获得值,需要诱导WebDriverWait
((并等待visibility_of_element_located
((
print(WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH,'//dt[text() = "PM2.5"]/following-sibling::dd[1]'))).text)