如果src通过python selenium和beautiful汤嵌套在视频中的源标签中,则无法抓取src



我把一个动漫网站作为一个项目来抓取,但当我试图抓取src时,它给了我一个错误。src嵌套在源标记中。我给出了下面的屏幕截图和代码。

示例屏幕截图

代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import re
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#launch url
url = "https://bestdubbedanime.com/Demon-Slayer-Kimetsu-no-Yaiba/26"
# create a new Firefox session
driver = webdriver.Firefox()
# driver.implicitly_wait(30)
driver.get(url)
# python_button = driver.find_element_by_class_name('playostki') #FHSU
# python_button.click() #click fhsu link
soup1 = BeautifulSoup(driver.page_source, 'html.parser')
video = soup1.find('video', id='my_video_1_html5_api')
# video = driver.find_element_by_id('my_video_1_html5_api')
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".playostki"))).click()      
driver.stop_client
driver.close
driver.quit

您没有得到src标记的原因,因为它是在单击视频后显示的。你必须首先点击该视频,然后尝试找到属性";src";来自元素。

driver.maximize_window()
driver.get("https://bestdubbedanime.com/Demon-Slayer-Kimetsu-no-Yaiba/26")
WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.XPATH,  "//div[@class='playostki']//img"))).click()
print(WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#my_video_1_html5_api > source"))).get_attribute("src"))
driver.quit()

输出:

https://bestdubbedanime.com/xz/api/v.php?u=eVcxb0ZCUEMraFd1Vi9pM2xqWUhtbXZMWjZ0Mlpoc1U0Tmhqc2VFcVViQUc3VUVhR0pZV1EvaW1nY1duaXBMeXYvUUY4RG5ab3p4MEtEMUFHRmVaN0taVG9sY3ZVcTRoeDZoVHhWLzdiYjQ5UStNN2FYSjJBSWNKL0t5S1hLNGEyVlZqV1BYQ2MwaCsyNWcvak1Db01EMnNtWGwwTTBBVld4MkNER0V3eGNCRXJ0cEY4RHFPclhwbTJpWFBPSmJI

最新更新