我试图在Composer 2环境中运行GKEStartPodOperator/KubernetesPodOperator任务,这使得在自动驾驶模式下使用GKE集群。我们有一个现有的Composer 1环境,其GKE集群不处于自动驾驶模式。我们使用Google云平台服务(BigQuery, GCS等)进行身份验证的任务,在Composer 2环境中由于401未授权而失败,但在Composer 1环境中成功。
在日志文件中,我可以看出两个环境中的任务通过对元数据服务器的请求获得它们的凭据。关键的区别在于Composer 1中的任务请求分配给任务运行所在节点的服务帐户,而Composer 2中的任务请求的似乎是工作负载标识池,如[project-name].svc.id.goog
。
Composer 1的日志如下:
[2021-10-22 12:38:01,349] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,351] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,352] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,359] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,374] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,392] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,395] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,398] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,412] {pod_launcher.py:148} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-22 12:38:01,414] {pod_launcher.py:148} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-22 12:38:01,415] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-22 12:38:01,437] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-22 12:38:01,452] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 226
[2021-10-22 12:38:01,454] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-id]-compute@developer.gserviceaccount.com/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-22 12:38:01,463] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-id]-compute@developer.gserviceaccount.com/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 1049
[2021-10-22 12:38:01,468] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-22 12:38:02,028] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-nam]/jobs?prettyPrint=false HTTP/1.1" 200 None
Composer 2的日志如下:
[2021-10-21 13:56:06,619] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,621] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,624] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,634] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,641] {pod_launcher.py:149} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-21 13:56:06,714] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-21 13:56:06,720] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 121
[2021-10-21 13:56:06,721] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-21 13:56:06,831] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 765
[2021-10-21 13:56:06,833] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-21 13:56:06,866] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-name]/jobs?prettyPrint=false HTTP/1.1" 401 None
基于工作负载身份文档,我猜我需要将特定的服务帐户绑定到运行pod的节点/节点池,但我不确定如何使用Composer 2 GKE Autopilot做到这一点,因为节点是为我管理的。Composer 2目前没有关于使用KubernetesPodOperator或GKEStartPodOperator的可用文档。
总之,我的问题是:我应该如何配置我的Composer 2环境PodOperator任务,以利用特定的服务帐户对GCP服务进行身份验证?
我从运营工程师那里得到了一些指导,现在KubernetesPodOperator任务通过服务帐户成功地与GCP服务进行了身份验证。我将在下面分享步骤和有用的信息。
首先,按照使用工作负载身份对Google Cloud进行身份验证的步骤进行操作。我以为Composer 2已经配置了kubernetes <>谷歌云服务账户绑定和注释对我来说,但事实并非如此。我必须创建名称空间、kubernetes服务帐户、ksa和gsa的绑定以及ksa的注释,就像说明所说的那样。
第二,我必须更新我的KubernetesPodOperator实例,将参数namespace
和service_account_name
设置为我在第一步中创建的命名空间和kubernetes服务帐户。
DAG的上传和任务的执行,我可以确认这两个步骤使我的任务能够请求绑定的Google服务帐户,并且从那里Google客户端库身份验证在我针对BigQuery的测试中成功。