'[Errno 13] Permission denied' - Jupyter Labs on AWS SageMaker



我在AWS SageMaker上使用Jupyter实验室实例。

内核:conda_mxnet_latest_p37

url_lib包含一些错误的url,我会对此进行异常处理。

['15', '259', '26', '58', 'https://imagepool.1und1-drillisch.de/v2/download/nachhaltigkeitsbericht/1&1Drillisch_Sustainability_Report_EN_2018.pdf', 'https://imagepool.1und1-drillisch.de//v2/download/nachhaltigkeitsbericht/2018-04-06_1und1-Drillisch_Sustainability_Report_eng.pdf', '6', 'http://youxin.37.com/uploads/file/1556248045.pdf', '80', 'https://multimedia.3m.com/mws/media/1691941O/2019-sustainability-report.PDF', 'https://s3-us-west-2.amazonaws.com/ungc-production/attachments/cop_2020/483648/original/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf?1583154650', 'https://drive.google.com/open?id=1_dnBcfXWjexy9QoWRhOk_3gnOkWfYRCw', 'http://aepsustainability.com/performance/docs/2020AEPGRIReport.pdf']  # sample

然而,那些正在工作的URL会抛出以下错误:

[Errno 13] Permission denied: '/data'

我没有打开目录,也没有文件,因为我没有下载它们。

我在终端运行时运气不佳:

sh-4.2$ chmod 777 data
sh-4.2$ chmod 777 data/
sh-4.2$ chmod 777 data/gri
sh-4.2$ chmod 777 data/gri/

代码:

import pandas as pd
import opendatasets as od
import urllib
import zipfile
import os
csr_df = pd.read_excel('data/Company Sustainability Reports.xlsx', index_col=None)
url_list = csr_df['Report PDF Address'].tolist()
for url in url_list:
try:
download = od.download(url, '/data/gri/')
filename = url.rsplit('/', 1)[1]
path_extract = 'data/gri/' + filename
with zipfile.ZipFile('data/gri/' + filename + '.zip', 'r') as zip_ref:
zip_ref.extractall(path_extract)
os.remove(path_extract + 'readme.txt')
filenames = os.listdir(path_extract)
scans = []
for f in filenames:
with Image.open(path_extract + f) as img:
matrix = np.array(img)
scans.append(matrix)
# shutil.rmtree(path_extract)
os.remove(path_extract[:-1] + '.zip')
except (urllib.error.URLError, IOError, RuntimeError) as e:
print('Download PDFs', e)

输出:

Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs list index out of range
Download PDFs [Errno 13] Permission denied: '/data'
...

如果还有什么我需要澄清的,请告诉我。

download有一个正斜杠/作为保存目录的第一个字符(第二个参数(。我删除了这个:

download = od.download(url, 'data/gri/')

输出:

...
Downloading http://youxin.37.com/uploads/file/1556248045.pdf to data/gri/1556248045.pdf
450560it [00:02, 207848.59it/s]
...

最新更新