我在组织充满url的CSV文件和按url下载每个图像时遇到了问题。
https://i.stack.imgur.com/KWV26.jpg
这是地狱,但目标是:
- 将这些图像的src写入csv文件,每行拆分每个url
- 并下载每个图像
from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import pandas as pd
import requests
import urllib
import csv
# BeautifulSoup4 findAll src from img
print ('Downloading URLs to file')
sleep(1)
with open('output.csv', 'w', newline='n', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(srcs)
print ('Downloading images to folder')
sleep(1)
filename = "output"
with open("{0}.csv".format(filename), 'r') as csvfile:
# iterate on all lines
i = 0
for line in csvfile:
splitted_line = line.split(',')
# check if we have an image URL
if splitted_line[1] != '' and splitted_line[1] != "n":
urllib.request.urlretrieve(splitted_line[1], "img_" + str(i) + ".png")
print ("Image saved for {0}".format(splitted_line[0]))
i += 1
else:
print ("No result for {0}".format(splitted_line[0]))
基于您提供的有限资源,我认为这是您需要的代码:
import requests
with open('output.csv', 'r') as file:
oldfile = file.read()
linkslist = oldfile.replace("n", "") # Because your file is wrongly splitted by new lines so I removed it
links = linkslist.split(",")
with open('new.csv', 'w') as file: # Writing all your links to a new file, this can combine with the below code but I think open file and requests at the same time will make it slower
for link in links:
file.write(link + "n")
for link in links:
response = requests.get(link) # This is to save image
file = open("(yourfilenamehere).png", "wb") # Replace the name that you want for the picture in here
file.write(response.content)
file.close()
请在代码中找到解释的注释,如果你有任何问题,只需问,我还没有测试它,因为我没有你的确切CSV,但它应该可以工作