谷歌云对象检测模型训练错误

我在谷歌训练计算机视觉模型时遇到了问题，我确信问题与GPU有关。我知道谷歌说默认你有1个GPU把训练失败与此消息错误："对8个K80加速器的请求超过了允许的最大值0 A100、0 K80、0 P100、0 P4、0 T4、0 TPU-V2、0 TPU_V2_POD、0 TP_V3、0 TPV_V3_POD、10 V100加速器">

你可以认为我有0从所有的加速器

这是我正在尝试运行的完整命令：

gcloud ai-platform jobs submit training segmentation_maskrcnn_test_0 ^
--runtime-version 2.1 ^
--python-version 3.7 ^
--job-dir=gs://image-segmentation-b/training-process ^
--package-path ./object_detection ^
--module-name object_detection.model_main_tf2 ^
--region us-central1 ^
--scale-tier CUSTOM ^
--master-machine-type n1-highcpu-32 ^
--master-accelerator count=8,type=nvidia-tesla-k80 ^
-- ^
--model_dir=gs://image-segmentation-b/training-process ^
--pipeline_config_path=gs:gs://image-segmentation-b/mask_rcnn_inception_resnet_v2_1024x1024_coco17_gpu-8 - cloud.config

这是完整的错误：

ERROR: (gcloud.ai-platform.jobs.submit.training) HttpError accessing <https://ml.googleapis.com/v1/projects/project id/jobs?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'content-encoding': 'gzip', 'date': 'Tue, 18 Jan 2022 11:12:39 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"', 'transfer-encoding': 'chunked', 'status': 429}>, content <{
"error": {
"code": 429,
"message": "Quota failure for project project id. The request for 8 K80 accelerators exceeds the allowed maximum of 0 A100, 0 K80, 0 P100, 0 P4, 0 T4, 0 TPU_V2, 0 TPU_V2_POD, 0 TPU_V3, 0 TPU_V3_POD, 0 V100 accelerators. To read more about Cloud ML Engine quota, see https://cloud.google.com/ml-engine/quotas.",
"status": "RESOURCE_EXHAUSTED",
"details": [
{
"@type": "type.googleapis.com/google.rpc.QuotaFailure",
"violations": [
{
"subject": "project id",
"description": "The request for 8 K80 accelerators exceeds the allowed maximum of 0 A100, 0 K80, 0 P100, 0 P4, 0 T4, 0 TPU_V2, 0 TPU_V2_POD, 0 TPU_V3, 0 TPU_V3_POD, 0 V100 accelerators."
}
]
}
]
}
}
>
This may be due to network connectivity issues. Please check your network settings, and the status of the service you are trying to reach.

如何修复此错误？我必须去某个地方并为该项目启用GPU吗？

在训练模型之前，您需要提高GPU配额。

您的项目或帐户没有足够的GPU配额来满足您的请求。

您可以在这里查看您的配额：API配额

相关内容

最新更新

热门标签：