我试图使用Python从我的GDrive下载一个有50000张图像的大文件夹到本地服务器。下面的代码收到一个限制错误。有其他解决方案吗?
import gdown
url = 'https://drive.google.com/drive/folders/135hTTURfjn43fo4f?usp=sharing' # I'm showing a fake token
gdown.download_folder(url)
检索文件夹内容失败:
gdrive文件夹,url:https://drive.google.com/drive/folders/135hTTURfjn43fo4f?usp=sharing至少有50个文件,gdrive不能下载超过这个限制,如果如果没有问题,请带——remaining-ok标志再运行一次。
就像评论中提到的风筝一样,使用remaining_ok
旗。
gdown.download_folder(url, remaining_ok=True)
这在https://pypi.org/project/gdown/中没有提到,所以可能会有任何混淆。
除了警告和这个github代码之外,remaining_ok
的任何引用都不可用。
似乎gdown
被严格限制为50个文件,并且还没有找到绕过它的方法。
如果gdown
是一个选项,那么请参见下面的代码。
脚本:
import io
import os
import os.path
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
credential_json = {
### Create a service account and use its the json content here ###
### https://cloud.google.com/docs/authentication/getting-started#creating_a_service_account
### credentials.json looks like this:
"type": "service_account",
"project_id": "*********",
"private_key_id": "*********",
"private_key": "-----BEGIN PRIVATE KEY-----n*********n-----END PRIVATE KEY-----n",
"client_email": "service-account@*********.iam.gserviceaccount.com",
"client_id": "*********",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/service-account%40*********.iam.gserviceaccount.com"
}
credentials = service_account.Credentials.from_service_account_info(credential_json)
drive_service = build('drive', 'v3', credentials=credentials)
folderId = '### Google Drive Folder ID ###'
outputFolder = 'output'
# Create folder if not existing
if not os.path.isdir(outputFolder):
os.mkdir(outputFolder)
items = []
pageToken = ""
while pageToken is not None:
response = drive_service.files().list(q="'" + folderId + "' in parents", pageSize=1000, pageToken=pageToken,
fields="nextPageToken, files(id, name)").execute()
items.extend(response.get('files', []))
pageToken = response.get('nextPageToken')
for file in items:
file_id = file['id']
file_name = file['name']
request = drive_service.files().get_media(fileId=file_id)
### Saves all files under outputFolder
fh = io.FileIO(outputFolder + '/' + file_name, 'wb')
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print(f'{file_name} downloaded completely.')
引用:
- https://cloud.google.com/docs/authentication/getting-started creating_a_service_account
下载限制设置在../gdown/download_folder.py
这是我使用gdown
下载url的一个解决方案- 进入需要下载文件的驱动器目录
- 使用ctrl/cmd选择所有文件A.点击共享+并复制所有链接
- 现在使用以下python脚本来完成您的工作
import re
import os
urls = <copied_urls>
url_list = urls.split(', ')
pat = re.compile('https://drive.google.com/file/d/(.*)/view?usp=sharing')
for url in url_list:
g = re.match(pat,url)
id = g.group(1)
down_url = f'https://drive.google.com/uc?id={id}'
os.system(f'gdown {down_url}')
注意:这个解决方案不适合50000张图片,因为复制的url字符串太大了。如果您的字符串很大,请将其复制到文件中并处理它,而不是使用变量。以我为例,我必须复制75个大文件
!pip uninstall --yes gdown # After running this line, restart Colab runtime.
!pip install gdown -U --no-cache-dir
import gdown
url = r'https://drive.google.com/drive/folders/1sWD6urkwyZo8ZyZBJoJw40eKK0jDNEni'
gdown.download_folder(url)