下载 Web 目录中的所有图像



我正在尝试使用BeautifulSoup4收集Web服务器上特定目录中的所有图像。

到目前为止,我得到了这个代码,

from init import *
from bs4 import BeautifulSoup
import urllib
import urllib.request
# use this image scraper from the location that 
#you want to save scraped images to
def make_soup(url):
html = urllib.request.urlopen(url)
return BeautifulSoup(html, features="html.parser")
def get_images(url):
soup = make_soup(url)
#this makes a list of bs4 element tags
images = [img for img in soup.findAll('img')]
print (str(len(images)) + "images found.")
print ('Downloading images to current working directory.')
#compile our unicode list of image links
image_links = [each.get('src') for each in images]
for each in image_links:
filename=each.split('/')[-1]
urllib.request.Request(each, filename)
return image_links
#a standard call looks like this
get_images('https://omabilder.000webhostapp.com/img/')

但是,这吐出以下错误

7images found.
Downloading images to current working directory.
Traceback (most recent call last):
File "C:UsersMyPCDesktopoma projektgetpics.py", line 1, in <module>
from init import *
File "C:UsersMyPCDesktopoma projektinit.py", line 9, in <module>
from getpics import *
File "C:UsersMyPCDesktopoma projektgetpics.py", line 26, in <module>
get_images('https://omabilder.000webhostapp.com/img/')
File "C:UsersMyPCDesktopoma projektgetpics.py", line 22, in get_images
urllib.request.Request(each, filename)
File "C:UsersMyPCAppDataLocalProgramsPythonPython37-32liburllibrequest.py", line 328, in __init__
self.full_url = url
File "C:UsersMyPCAppDataLocalProgramsPythonPython37-32liburllibrequest.py", line 354, in full_url
self._parse()
File "C:UsersMyPCAppDataLocalProgramsPythonPython37-32liburllibrequest.py", line 383, in _parse
raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: '/icons/blank.gif'

我不明白的是以下内容,

目录中没有GIF,也没有/icon/子目录。 此外,它吐出7张图片被发现,而只有3张上传到网站。

gif是网站上链接旁边的图标(微小的~20x20像素图像(。 它们实际上显示在网站上。如果我理解正确,您想下载 png 图像——这些是链接,而不是您提供的 url 中的图像。

如果要从链接下载png图像,则可以使用如下所示的内容:

from bs4 import BeautifulSoup
import urllib
import urllib.request
import os
# use this image scraper from the location that 
#you want to save scraped images to
def make_soup(url):
html = urllib.request.urlopen(url)
return BeautifulSoup(html, features="html.parser")
def get_images(url):
soup = make_soup(url)
# get all links (start with "a")
images  = [link["href"] for link in soup.find_all('a', href=True)]
# keep ones that end with png
images = [im for im in images if im.endswith(".png")]    
print (str(len(images)) + " images found.")
print ('Downloading images to current working directory.')
#compile our unicode list of image links
for each in images:
urllib.request.urlretrieve(os.path.join(url, each), each)
return images
# #a standard call looks like this
get_images('https://omabilder.000webhostapp.com/img/')

最新更新