Python 网页抓取工具不会保存图像文件

我开始开发一个小型的图像抓取终端程序，该程序应该将图像保存到程序层次结构中的指定文件中。这是我在网上找到的一个基本教程。然而，每当我在终端输入搜索词开始抓取bing.com(是的，我知道(时，程序就会崩溃。我得到的错误似乎集中在图像文件类型没有被识别，或者图像将被保存的文件路径没有被识别：

from bs4 import BeautifulSoup
import requests
from PIL import Image
from io import BytesIO
search = input("Search for:")
params = {"q": search}
r = requests.get("http://www.bing.com/images/search", params=params)
soup = BeautifulSoup(r.text, "html.parser")
links = soup.findAll("a", {"class": "thumb"})
for item in links:
img_obj = requests.get(item.attrs["href"])
print("Getting", item.attrs["href"])
title = item.attrs["href"].split("/")[-1]
img = Image.open(BytesIO(img_obj.content))
img.save("./scraped_images/" + title, img.format)

引发错误：出现异常：FileNotFoundError[Erno 2]没有这样的文件或目录："/scraped_images/3849747391_4a7dc3f19e_b.jpg'

我尝试添加一个文件路径变量(使用pathlib(，并将其与其他必要的变量连接起来：

from bs4 import BeautifulSoup
import requests
from PIL import Image
from io import BytesIO
from pathlib import Path
image_folder = Path("./scraped_images/")
search = input("Search for:")
params = {"q": search}
r = requests.get("http://www.bing.com/images/search", params=params)
soup = BeautifulSoup(r.text, "html.parser")
links = soup.findAll("a", {"class": "thumb"})
for item in links:
img_obj = requests.get(item.attrs["href"])
print("Getting", item.attrs["href"])
title = item.attrs["href"].split("/")[-1]
img = Image.open(BytesIO(img_obj.content))
img.save(image_folder + title, img.format)

引发错误：发生异常：TypeError不支持+的操作数类型："WindowsPath"one_answers"str">

我检查了PIL、BeautifulSoup等的文档，看看是否有任何更新可能让我搞砸了，我检查了bing上的元素，看看类是否正确，甚至尝试按不同的类搜索，但都没有成功。我不知所措。感谢您的任何想法或指导。谢谢

我对您的代码做了一些更改：

from bs4 import BeautifulSoup
import requests
from pathlib import Path
import os
image_folder = Path("./scraped_images/")
if not os.path.isdir(image_folder):
print('Making %s'%(image_folder))
os.mkdir(image_folder)
search = input("Search for:")
params = {"q": search}
r = requests.get("http://www.bing.com/images/search", params=params)
soup = BeautifulSoup(r.text, "html.parser")
links = soup.findAll("a", {"class": "thumb"})
for item in links:
img_obj = requests.get(item.attrs["href"])
print("Getting", item.attrs["href"])
title = item.attrs["href"].split("/")[-1]
if img_obj.ok:
with open('%s/%s'%(image_folder, title), 'wb') as file:
file.write(img_obj.content)

你可以使用PIL，但在这种情况下你不需要它。

我还改进了PIL的代码以更好地工作：

from bs4 import BeautifulSoup
import requests
from PIL import Image
from io import BytesIO
from pathlib import Path
s = requests.Session()
image_folder = Path("./scraped_images/")
search = input("Search for:")
params = {"q": search}
r = s.get("http://www.bing.com/images/search", params=params)
soup = BeautifulSoup(r.text, "html.parser")
links = soup.findAll("a", {"class": "thumb"})
for item in links:
try:
img_obj = s.get(item.attrs["href"], headers={'User-Agent': 'User-Agent: Mozilla/5.0'})
if img_obj.ok:
print("Getting", item.attrs["href"])
title = item.attrs["href"].split("/")[-1]
if '?' in title:
title = title.split('?')[0]
img = Image.open(BytesIO(img_obj.content))
img.save(str(image_folder) + '/' + title, img.format)
else:
continue
except OSError:
print('nError downloading %s try to visit'
'n%sn'
'manually and try to get the image manually.n' %(title, item.attrs["href"]))

我使用了一个请求会话，并添加了try和exceptif PIL无法制作图像。我也只会尝试制作一张图片，如果该请求得到了网站200的回复。

相关内容

最新更新

热门标签：