我正试图通过在https://www.watchcartoononline.com/bobs-burgers-season-9-episode-3-tweentrepreneurs.
我不知道如何从这个网站提取视频网址。我使用Chrome和Firefox网络开发工具来确定它在iframe中,但使用BeautifulSoup搜索iframe来提取src URL,会返回与视频无关的链接。对mp4或flv文件的引用在哪里(我在开发人员工具中看到了这些文件——尽管禁止单击它们(。
如果您了解如何使用BeautifulSoup进行视频网络抓取并提出请求,我们将不胜感激。
如果需要,这里有一些代码。很多教程都说要使用"A"标签,但我没有收到任何"A"标记。
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.watchcartoononline.com/bobs-burgers-season-9-episode-5-live-and-let-fly")
soup = BeautifulSoup(r.content,'html.parser')
links = soup.find_all('iframe')
for link in links:
print(link['src'])
import requests
url = "https://disk19.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e03.mp4?st=_EEVz36ktZOv7ZxlTaXZfg&e=1541637622"
def download_file(url,filename):
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
#f.flush() commented by recommendation from J.F.Sebastian
return filename
download_file(url,"bobs.burgers.s09e03.mp4")
这段代码将把这一集下载到你的电脑上。视频url嵌套在<source>
标签中的<video>
标签内。
背景信息
(向下滚动查看答案(
只有当您试图从中获取视频格式的网站在HTML中明确说明时,才可以轻松地获得。例如,如果你想通过引用.mp4 URL从你选择的网站获取.mp4文件,那么如果我们在这里使用这个网站;https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314
如果我们在inspect元素中查找<video>
,将会有一个src包含.mp4
现在,如果我们试图从这个网站抓取.mp4 URL,就像这个一样
import requests
from bs4 import BeautifulSoup
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')
for mp4 in soup.find_all('video'):
mp4 = mp4['src']
print(mp4)
我们将得到KeyError: 'src'
输出。这是由于实际视频存储在source
中,如果我们打印出soup.find_all('video')
中的值,我们可以查看
import requests
from bs4 import BeautifulSoup
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')
for mp4 in soup.find_all('video'):
pass
print(mp4)
输出:
<video class="video-js vjs-default-skin vjs-big-play-centered" controls="" data-setup="{}" height="264" id="example_video_1" poster="" preload="none" width="640">
<source src="https://mountainoservo0002.animecdn.com/Yakunara-Mug-Cup-mo/Yakunara-Mug-Cup-mo-Episode-01.1-1080p.mp4" type="video/mp4"/>
</video>
因此,如果我们希望现在下载.mp4,我们将使用source
元素并从中获取src
。
import requests
import shutil # - - This module helps to transfer information from 1 file to another
from bs4 import BeautifulSoup # - - We could honestly do this without soup
# - - Get the url of the site you want to scrape
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')
# - - Get the .mp4 url and the filename
for vid in soup.find_all('source'):
url = vid['src']
filename = vid['src'].split('/')[-1]
# - - Get the video
response = requests.get(url, stream=True)
# - - Make sure the status is OK
if response.status_code == 200:
# - - Make sure the file size is not 0
response.raw.decode_content = True
with open(filename, 'wb') as f:
# - - Copy what's in response.raw and transfer it into the file
shutil.copyfileobj(response.raw, f)
(很明显,您可以通过手动复制源的src
并将其用作基本URL来简化这一点,而无需使用html_url
。我只是想向您展示,您可以选择引用.mp4(也称为源的src
((
再一次,并不是每个网站都是如此清晰。特别是对于这个网站,我们很幸运,它是如此易于管理。您可能试图从中抓取视频的其他站点可能需要从Elements
(在inspect元素中(转到Network
。在那里,你必须尝试获取嵌入链接的片段,并尝试下载它们来组成完整的视频,但再次强调,这并不总是那么容易,但你请求的网站的视频是。
你的答案
转到inspect元素,单击位于视频顶部的Chromecast Player (2. Player)
以查看HTML属性,最后单击看起来像的嵌入
/inc/embed/embed.php?file=bobs.burgers.s09e05.flv&hd=1&pid=437035&h=25424730eed390d0bb4634fa93a2e96c&t=1618011716&embed=cizgi
完成后,单击播放,确保inspect元素处于打开状态,单击视频以查看属性(或ctrl+f以筛选<video>
(,然后复制应该是的src
https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876
现在我们可以用python下载它了。
import requests
# - - This module helps to transfer information from 1 file to another
import shutil
url = "https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876"
response = requests.get(url, stream=True)
if response.status_code == 200:
# - - Make sure the file size is not 0
response.raw.decode_content = True
with open('bobs-burgers.mp4', 'wb') as f:
# - - Take the data from response.raw and transfer it to the file
shutil.copyfileobj(response.raw, f)
print('downloaded file')
else:
print('Download failed')