谷歌云对象检测模型训练错误



我在谷歌训练计算机视觉模型时遇到了问题,我确信问题与GPU有关。我知道谷歌说默认你有1个GPU把训练失败与此消息错误:"对8个K80加速器的请求超过了允许的最大值0 A100、0 K80、0 P100、0 P4、0 T4、0 TPU-V2、0 TPU_V2_POD、0 TP_V3、0 TPV_V3_POD、10 V100加速器">

你可以认为我有0从所有的加速器

这是我正在尝试运行的完整命令:

gcloud ai-platform jobs submit training segmentation_maskrcnn_test_0 ^
--runtime-version 2.1 ^
--python-version 3.7 ^
--job-dir=gs://image-segmentation-b/training-process ^
--package-path ./object_detection ^
--module-name object_detection.model_main_tf2 ^
--region us-central1 ^
--scale-tier CUSTOM ^
--master-machine-type n1-highcpu-32 ^
--master-accelerator count=8,type=nvidia-tesla-k80 ^
-- ^
--model_dir=gs://image-segmentation-b/training-process ^
--pipeline_config_path=gs:gs://image-segmentation-b/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8 - cloud.config

这是完整的错误:

ERROR: (gcloud.ai-platform.jobs.submit.training) HttpError accessing <https://ml.googleapis.com/v1/projects/project id/jobs?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'content-encoding': 'gzip', 'date': 'Tue, 18 Jan 2022 11:12:39 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"', 'transfer-encoding': 'chunked', 'status': 429}>, content <{
"error": {
"code": 429,
"message": "Quota failure for project project id. The request for 8 K80 accelerators exceeds the allowed maximum of 0 A100, 0 K80, 0 P100, 0 P4, 0 T4, 0 TPU_V2, 0 TPU_V2_POD, 0 TPU_V3, 0 TPU_V3_POD, 0 V100 accelerators. To read more about Cloud ML Engine quota, see https://cloud.google.com/ml-engine/quotas.",
"status": "RESOURCE_EXHAUSTED",
"details": [
{
"@type": "type.googleapis.com/google.rpc.QuotaFailure",
"violations": [
{
"subject": "project id",
"description": "The request for 8 K80 accelerators exceeds the allowed maximum of 0 A100, 0 K80, 0 P100, 0 P4, 0 T4, 0 TPU_V2, 0 TPU_V2_POD, 0 TPU_V3, 0 TPU_V3_POD, 0 V100 accelerators."
}
]
}
]
}
}
>
This may be due to network connectivity issues. Please check your network settings, and the status of the service you are trying to reach.

如何修复此错误?我必须去某个地方并为该项目启用GPU吗?

在训练模型之前,您需要提高GPU配额。

您的项目或帐户没有足够的GPU配额来满足您的请求。

您可以在这里查看您的配额:API配额

相关内容

  • 没有找到相关文章

最新更新