我是一个网络抓取新手,我有一个问题。
我想在Udemy的特定搜索结果中获得课程名称(从这个链接https://www.udemy.com/courses/search/?src=ukw&q=veri+bilimi)。
下面是我的代码:import requests
from bs4 import BeautifulSoup
result = requests.get("https://www.udemy.com/courses/search/?src=ukw&q=veri+bilimi")
print(result.status_code)
src = result.content
soup = BeautifulSoup(src, "lxml")
print(soup.find("div", attrs={"class":"udlite-focus-visible-target udlite-heading-md course-card--course-title--2f7tE"}))
它变成"None"而不是课程名称。遗憾的是,我没有意识到我的错误。
你能帮我吗?
udemy网站正在使用javascript加载请求无法访问的课程标题。你需要使用selenium
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
url ="https://www.udemy.com/courses/search/?src=ukw&q=veri+bilimi"
import time
webdriver =webdriver.Chrome()
webdriver.get(url)
time.sleep(6) # delay 6 sec
soup = BeautifulSoup(webdriver.page_source, "lxml")
course_titles = soup.find_all("div", attrs={"class":"udlite-focus-visible-target udlite-heading-md course-card--course-title--2f7tE"})
for title in course_titles:
print(title.get_text())
如果你需要Selenium install .