自动下载公共GDrive文件夹中的大文件



我的最终目标是用python(例如gdown(自动下载公共GDrive文件夹中的所有文件(每个文件都像3G一样大(。经过大量的尝试,我终于找到了一种方法,使用谷歌表单中的谷歌脚本从文件夹中提取所有链接,这样我就有了所有文件的链接,我需要以这种格式下载:

https://drive.google.com/file/d/IDA/view?usp=drivesdk&resourcekey=otherIDA
https://drive.google.com/file/d/IDB/view?usp=drivesdk&resourcekey=otherIDB
https://drive.google.com/file/d/IDC/view?usp=drivesdk&resourcekey=otherIDC
...
https://drive.google.com/file/d/IDZ/view?usp=drivesdk&resourcekey=otherIDZ

然后我想用for循环迭代链接以下载所有文件:

import gdown
import re
regex = "([w-]){33}|([w-]){19}"
download_url_basename = "https://drive.google.com/uc?export=download&id="
for i, l in enumerate(links_to_download):
file_id = re.search(regex, url)[0]
gdown.download(download_url_basename + file_id, f"file_{i}")

然而,我遇到了:

Permission denied: https://drive.google.com/uc?id=ID
Maybe you need to change permission over 'Anyone with the link'?

这是一个公共存储库,所以尽管我可以访问它,并且有足够的权限手动下载每个文件,但我只能在视图模式下获得可共享的链接。

有没有一种方法可以自动将链接转换为可以自动下载的内容?它是故意屏蔽的吗?有没有什么方法可以自动完成而不是手动下载400个文件?

编辑:这个问题略有关联,但这个问题不是源于同一个问题,也没有自动处理任何问题。

编辑2:我使用google drive API python SDK,通过google控制台生成服务帐户,激活OAuth2并生成OAuth2 json凭据来构建drive_service对象:

from google_auth_oauthlib.flow import Flow, InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload, MediaIoBaseDownload
from google.auth.transport.requests import Request
import io
import re
SCOPES = ['https://www.googleapis.com/auth/drive']
CLIENT_SECRET_FILE = "myjson.json"
authorized_port = 6006 # authorize URI redirect on the console
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server(port=authorized_port)
drive_service = build("drive", "v3", credentials=cred)
download_url_basename = "https://drive.google.com/uc?id="
regex = "([w-]){33}|([w-]){19}"
for i, l in enumerate(links_to_download):
url = l
file_id = re.search(regex, url)[0]
request = drive_service.files().get_media(fileId=file_id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))

然而,我现在得到:

googleapiclient.errors.HttpError: <HttpError 404 when requesting https://www.googleapis.com/drive/v3/files/fileId?alt=media returned "File not found: fileID.". Details: "[{'domain': 'global', 'reason': 'notFound', 'message': 'File not found: fileId.', 'locationType': 'parameter', 'location': 'fileId'}]">

找到一个相关问题知道吗?

好吧,多亏了Google API,我终于可以让它工作了!

从获得文件夹中的链接列表到下载它们,整个过程都很麻烦,我可能有一天会写一篇博客文章:

from google_auth_oauthlib.flow import Flow, InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload, MediaIoBaseDownload
from google.auth.transport.requests import Request
import io
import re
SCOPES = ['https://www.googleapis.com/auth/drive']
CLIENT_SECRET_FILE = "myjson.json"
authorized_port = 6006 # authorize URI redirect on the console
flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRET_FILE, SCOPES)
cred = flow.run_local_server(port=authorized_port)
drive_service = build("drive", "v3", credentials=cred)
regex = "(?<=https://drive.google.com/file/d/)[a-zA-Z0-9]+"
for i, l in enumerate(links_to_download):
url = l
file_id = re.search(regex, url)[0]
request = drive_service.files().get_media(fileId=file_id)
fh = io.FileIO(f"file_{i}", mode='wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print("Download %d%%." % int(status.progress() * 100))

相关内容

  • 没有找到相关文章

最新更新