使用Python从Google-Drive下载一个大文件夹



我试图使用Python从我的GDrive下载一个有50000张图像的大文件夹到本地服务器。下面的代码收到一个限制错误。有其他解决方案吗?

import gdown
url = 'https://drive.google.com/drive/folders/135hTTURfjn43fo4f?usp=sharing'  # I'm showing a fake token
gdown.download_folder(url)

检索文件夹内容失败:

gdrive文件夹,url:https://drive.google.com/drive/folders/135hTTURfjn43fo4f?usp=sharing至少有50个文件,gdrive不能下载超过这个限制,如果如果没有问题,请带——remaining-ok标志再运行一次。

就像评论中提到的风筝一样,使用remaining_ok旗。

gdown.download_folder(url, remaining_ok=True)

这在https://pypi.org/project/gdown/中没有提到,所以可能会有任何混淆。

除了警告和这个github代码之外,remaining_ok的任何引用都不可用。

编辑:

似乎gdown被严格限制为50个文件,并且还没有找到绕过它的方法。

如果gdown是一个选项,那么请参见下面的代码。

脚本:

import io
import os
import os.path
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
credential_json = {
### Create a service account and use its the json content here ###
### https://cloud.google.com/docs/authentication/getting-started#creating_a_service_account
### credentials.json looks like this:
"type": "service_account",
"project_id": "*********",
"private_key_id": "*********",
"private_key": "-----BEGIN PRIVATE KEY-----n*********n-----END PRIVATE KEY-----n",
"client_email": "service-account@*********.iam.gserviceaccount.com",
"client_id": "*********",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account%40*********.iam.gserviceaccount.com"
}
credentials = service_account.Credentials.from_service_account_info(credential_json)
drive_service = build('drive', 'v3', credentials=credentials)
folderId = '### Google Drive Folder ID ###'
outputFolder = 'output'
# Create folder if not existing
if not os.path.isdir(outputFolder):
os.mkdir(outputFolder)
items = []
pageToken = ""
while pageToken is not None:
response = drive_service.files().list(q="'" + folderId + "' in parents", pageSize=1000, pageToken=pageToken,
fields="nextPageToken, files(id, name)").execute()
items.extend(response.get('files', []))
pageToken = response.get('nextPageToken')
for file in items:
file_id = file['id']
file_name = file['name']
request = drive_service.files().get_media(fileId=file_id)
### Saves all files under outputFolder
fh = io.FileIO(outputFolder + '/' + file_name, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(f'{file_name} downloaded completely.')

引用:

  • https://cloud.google.com/docs/authentication/getting-started creating_a_service_account

下载限制设置在../gdown/download_folder.py

这是我使用gdown

下载url的一个解决方案
  • 进入需要下载文件的驱动器目录
  • 使用ctrl/cmd选择所有文件A.点击共享+并复制所有链接
  • 现在使用以下python脚本来完成您的工作
import re
import os
urls = <copied_urls>
url_list = urls.split(', ')
pat = re.compile('https://drive.google.com/file/d/(.*)/view?usp=sharing')
for url in url_list:
g = re.match(pat,url)
id = g.group(1)
down_url = f'https://drive.google.com/uc?id={id}'
os.system(f'gdown {down_url}')

注意:这个解决方案不适合50000张图片,因为复制的url字符串太大了。如果您的字符串很大,请将其复制到文件中并处理它,而不是使用变量。以我为例,我必须复制75个大文件

!pip uninstall --yes gdown # After running this line, restart Colab runtime.
!pip install gdown -U --no-cache-dir
import gdown
url = r'https://drive.google.com/drive/folders/1sWD6urkwyZo8ZyZBJoJw40eKK0jDNEni'
gdown.download_folder(url)