我有一个HTML页面,其中包含以下40个div
<div class='movie-featured'>
<div class="item analytics">
<div class="movie-details">
<div class="movie-rating-wrapper">
<span class="movie-rating-summary">
<span>some text</span>
</span>
</div>
</div>
</div>
</div>
我正试图通过:find_element_by_css_selector('span.moview-rating-summary').find_element_by_tag_name('span').text
从40个div中的每一个内的这个跨度<span>some text</span>
中获取文本
输出:
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '6/10', '', '', '', '', '', '', '', '', '7.5/10', '', '', '', '', '']
正如你所看到的,我只从几个跨度中得到文本,而不是所有的跨度。
我也尝试过:find_element_by_tag_name('span').get_attribute('textContent')
和find_element_by_tag_name('span').get_attribute('innerHTML')
。
但的结果仍然相同
有什么办法解决的吗??
代码试用:
from selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Chrome()
delay = 10
browser.get("www.example.com")
browser.execute_script("window.scrollTo(0,document.body.scrollHeight)")
time.sleep(2)
images =[]
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'item-responsive')))
body = browser.find_element_by_class_name('movie-featured') # body of images container
imageItems = body.find_elements_by_css_selector('div.item.analytics') #list of divs that hold movies images
for item in imageItems:
rate = item.find_element_by_css_selector('span.moview-rating-summary').text
images.append(rate)
print(images)
browser.close()
谢谢你们的帮助。我通过如下更改代码来解决问题:
body = browser.find_element_by_class_name('movie-featured')
rateDivs = body.find_elements_by_xpath('//div[@class="moview-rating-wrapper"]')
ratelist = []
for div in rateDivs:
span = div.find_element_by_css_selector('span.moview-rating-summary')
ratespan = span.find_element_by_tag_name('span')
rate = ratespan.text
if len(rate) > 0:
ratelist.append(rate)
else:
continue
print(ratelist)
browser.close()
我真的很感激你花这么多时间来帮助我。
要使用Selenium和python从所有<span>
中提取文本,例如一些文本,您必须诱导WebDriverWait等待visibility_of_all_elements_located()
,并且您可以使用以下定位器策略之一:
-
使用
CSS_SELECTOR
和get_attribute("innerHTML")
:print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.movie-rating-summary>span")))])
-
使用
XPATH
和文本属性:print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='movie-rating-summary']/span")))])
-
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
Outro
有用文档链接:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性返回The text of the element.
- 使用Selenium的文本和innerHTML之间的差异
试试这个:
driver.find_element_by_xpath('//span[@class="movie-rating-summary"]/span[1]')