Python Selenium 给了我一个空的文本字符串<span> <span>



我有一个HTML页面,其中包含以下40个div

<div class='movie-featured'>
<div class="item analytics">
<div class="movie-details">
<div class="movie-rating-wrapper">
<span class="movie-rating-summary">
<span>some text</span>
</span>
</div>
</div>
</div>
</div>

我正试图通过:find_element_by_css_selector('span.moview-rating-summary').find_element_by_tag_name('span').text从40个div中的每一个内的这个跨度<span>some text</span>中获取文本

输出:

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '6/10', '', '', '', '', '', '', '', '', '7.5/10', '', '', '', '', '']

正如你所看到的,我只从几个跨度中得到文本,而不是所有的跨度。

我也尝试过:find_element_by_tag_name('span').get_attribute('textContent')find_element_by_tag_name('span').get_attribute('innerHTML')

但的结果仍然相同

有什么办法解决的吗??

代码试用:

from selenium import webdriver
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
browser = webdriver.Chrome()
delay = 10 
browser.get("www.example.com")

browser.execute_script("window.scrollTo(0,document.body.scrollHeight)")
time.sleep(2)
images =[]
myElem = WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'item-responsive')))

body = browser.find_element_by_class_name('movie-featured') # body of images container
imageItems = body.find_elements_by_css_selector('div.item.analytics')  #list of divs that hold movies images

for item in imageItems:

rate = item.find_element_by_css_selector('span.moview-rating-summary').text
images.append(rate)

print(images)
browser.close()

谢谢你们的帮助。我通过如下更改代码来解决问题:

body = browser.find_element_by_class_name('movie-featured')
rateDivs = body.find_elements_by_xpath('//div[@class="moview-rating-wrapper"]')
ratelist = []
for div in rateDivs:
span = div.find_element_by_css_selector('span.moview-rating-summary')
ratespan = span.find_element_by_tag_name('span')
rate = ratespan.text
if len(rate) > 0:
ratelist.append(rate)
else:
continue
print(ratelist)
browser.close()

我真的很感激你花这么多时间来帮助我。

要使用Selenium和python从所有<span>中提取文本,例如一些文本,您必须诱导WebDriverWait等待visibility_of_all_elements_located(),并且您可以使用以下定位器策略之一:

  • 使用CSS_SELECTORget_attribute("innerHTML"):

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.movie-rating-summary>span")))])
    
  • 使用XPATH文本属性:

    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='movie-rating-summary']/span")))])
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Outro

有用文档链接:

  • get_attribute()方法Gets the given attribute or property of the element.
  • text属性返回The text of the element.
  • 使用Selenium的文本和innerHTML之间的差异

试试这个:

driver.find_element_by_xpath('//span[@class="movie-rating-summary"]/span[1]')

相关内容

最新更新