如何保存来自互联网的图片流?

我想自动从网络源下载图片，该源使用流，编码为Base 64字符串。我的Google Chrome浏览器正确地将源中的数据识别为JPG图片并显示出来。

现在，此页面仅供注册用户访问。在这种情况下，我应该使用Selenium吗？

所以，基本上，我想生成大约 1000 个url请求并将所有流图片保存在我的本地磁盘上。

我请求的网址示例：

https://ia800703.us.archive.org/BookReader/BookReaderImages.php?zip=/10/items/nortonreaderan6theast/nortonreaderan6theast_jp2.zip&file=nortonreaderan6theast_jp2/nortonreaderan6theast_1257.jp2&scale=1&rotate=0

响应是带有图片的html文档：

<html>
<head>
<meta name="viewport" content="width=device-width, minimum-scale=0.1">
<title>BookReaderImages.php (2447×4005) </title>
</head>
<body style="margin: 0px; background: #0e0e0e;">
<img style="-webkit-user-select: none;cursor: zoom-in;" src="https://ia800703.us.archive.org/BookReader/BookReaderImages.php?zip=/10/items/nortonreaderan6theast/nortonreaderan6theast_jp2.zip&file=nortonreaderan6theast_jp2/nortonreaderan6theast_1257.jp2&scale=1&rotate=0" width="556" height="911">
</body>
</html>

图片流是一个Base 64字符串。浏览器允许将其保存为nortonreaderan6theast_1257.jpg

有什么建议吗？

我设法实现了一个可行的解决方案，尽管远非理想的解决方案。为此，我使用了Selenium、chromedriver和Chrome扩展Click and Save。首先，一旦启动了浏览器实例，我必须手动安装扩展。之后，我登录一个网站，打开一本我即将下载的书。每次创建新实例时，我都必须执行这些步骤。

在贯穿我使用的所有页面(url(的循环中：

driver.get(url) # Selenium method
''' Click and Save extension automatically detects the picture and saves it to Downloads directory (or other) in Windows OS'''
while not os.path.exists(file_path): # wait till the file has been created
time.sleep(0.5)

总的来说，这个过程非常慢，1000 小时内大约 1 页。欢迎任何改进。

相关内容

最新更新

热门标签：