Python组织url的CSV,按url拆分行,从url下载图像



我在组织充满url的CSV文件和按url下载每个图像时遇到了问题。

https://i.stack.imgur.com/KWV26.jpg

这是地狱,但目标是:

  1. 将这些图像的src写入csv文件,每行拆分每个url
  2. 并下载每个图像

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import pandas as pd
import requests
import urllib
import csv


# BeautifulSoup4 findAll src from img


print ('Downloading URLs to file')
sleep(1)
with open('output.csv', 'w', newline='n', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(srcs)


print ('Downloading images to folder')
sleep(1)
filename = "output"
with open("{0}.csv".format(filename), 'r') as csvfile:
# iterate on all lines
i = 0
for line in csvfile:
splitted_line = line.split(',')
# check if we have an image URL
if splitted_line[1] != '' and splitted_line[1] != "n":
urllib.request.urlretrieve(splitted_line[1], "img_" + str(i) + ".png")
print ("Image saved for {0}".format(splitted_line[0]))
i += 1
else:
print ("No result for {0}".format(splitted_line[0]))

基于您提供的有限资源,我认为这是您需要的代码:

import requests
with open('output.csv', 'r') as file:
oldfile = file.read()
linkslist = oldfile.replace("n", "") # Because your file is wrongly splitted by new lines so I removed it
links = linkslist.split(",")
with open('new.csv', 'w') as file: # Writing all your links to a new file, this can combine with the below code but I think open file and requests at the same time will make it slower
for link in links:
file.write(link + "n")
for link in links:
response = requests.get(link) # This is to save image
file = open("(yourfilenamehere).png", "wb") # Replace the name that you want for the picture in here
file.write(response.content)
file.close()

请在代码中找到解释的注释,如果你有任何问题,只需问,我还没有测试它,因为我没有你的确切CSV,但它应该可以工作

相关内容

  • 没有找到相关文章

最新更新