如何从网站下载嵌套标签中的图像



我想下载img标签中的所有图像,这些图像嵌套在这样的东西中:

<div id="onlive">
<div>
<section class="class1">
<ul class=="class2">
<li>
<div class="class3">
<div class="class4 class4-001" video_id="001">
<div class="class5">
<img src="https://...">
</div>
</div>
</div>
</li>
<li>
<div class="class3">
<div class="class4 class4-002" video_id="002">
<div class="class5">
<img src="https://...">
</div>
</div>
</div>
</li>
<li>...</li>
<li>...</li>
<li>...</li>
</ul>
</section>
</div>
</div>

在这个示例中,应该有5个图像要下载并保存在"images"目录中。此外,我想使用"视频id"作为每个图像的名称。

这是我的密码。它没有错误,但没有得到任何图像:

import requests
from bs4 import BeautifulSoup
import os
import logging
import urllib.request
url = "https://www...com/onlive" 
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
links = []
for img in soup.find_all('img'):
link = img.get('src')
links.append(link)
for i in range(len(links)):
filename = 'images/img{}.jpg'.format(i)
urllib.request.urlretrieve(links[i], filename)

严格基于问题中的示例html,这应该适用于代码的相关部分:

videos = """your html above, fixed""" #the html you have there is malformed
soup = bs(videos,'lxml')
targets = soup.select('div[class*="class4"]')
for target in targets:
i= target.attrs['video_id']
link = target.select_one('img').attrs['src']
filename = f'images/img{i}.jpg'
print(filename,link)

输出:

images/img001.jpg https://...
images/img002.jpg https://...

最新更新