我正在努力解决volumeattach错误。我有一个区域持久磁盘,它与我的区域GKE集群在同一个GCP项目中。我的区域集群位于europe-west2,节点位于europe-west2-a、b和c。区域磁盘在europe-west2-b和c区域之间复制。
我有一个指向gcePersistantDisk的nfs-server部署清单。
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: []
labels:
app.kubernetes.io/managed-by: Helm
name: nfs-server
namespace: namespace
spec:
progressDeadlineSeconds: 600
replicas: 1
selector:
matchLabels:
role: nfs-server
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
role: nfs-server
spec:
serviceAccountName: nfs-server
containers:
- image: gcr.io/google_containers/volume-nfs:0.8
imagePullPolicy: IfNotPresent
name: nfs-server
ports:
- containerPort: 2049
name: nfs
protocol: TCP
- containerPort: 20048
name: mountd
protocol: TCP
- containerPort: 111
name: rpcbind
protocol: TCP
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: nfs-pvc
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- gcePersistentDisk:
fsType: ext4
pdName: my-regional-disk-name
name: nfs-pvc
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution :
nodeSelectorTerms:
- matchExpressions:
- key: topology.gke.io/zone
operator: In
values:
- europe-west2-b
- europe-west2-c
和我的pv/pvc
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 200Gi
nfs:
path: /
server: nfs-server.namespace.svc.cluster.local
persistentVolumeReclaimPolicy: Retain
volumeMode: Filesystem
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
labels:
app.kubernetes.io/managed-by: Helm
name: nfs-pvc
namespace: namespace
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 8Gi
storageClassName: ""
volumeMode: Filesystem
volumeName: nfs-pv
当我应用上面的部署清单时,我得到以下错误:
'rpc error: code = Unavailable desc = ControllerPublish not permitted on node "projects/ap-mc-qa-xxx-xxxx/zones/europe-west2-a/instances/node-instance-id" due to backoff condition'
volume附件告诉我:
Attach Error: Message: rpc error: code = NotFound desc = ControllerPublishVolume could not find volume with ID projects/UNSPECIFIED/zones/UNSPECIFIED/disks/my-regional-disk-name: googleapi: Error 0: , notFound
这些清单在部署到分区集群/磁盘时似乎工作得很好。我检查了一些事情,比如确保集群svc帐户具有必要的权限。磁盘当前未被使用。
我错过了什么??
我认为我们应该关注组成Kubernetes集群的节点类型。
区域持久磁盘不能用于内存优化的机器或计算优化的机器。
如果使用区域持久化磁盘不是硬性要求,请考虑使用非区域持久化磁盘存储类。如果使用区域持久磁盘是硬性要求,请考虑调度策略,如污点和容忍,以确保需要区域PD的pod被调度到非优化机器的节点池上。
https://cloud.google.com/kubernetes-engine/docs/troubleshooting error_400_cannot_attach_repd_to_an_optimized_vm
所以上面的方法不起作用的原因是因为一个区域持久磁盘特性允许在同一区域内的2个区域中创建可用的持久磁盘。为了使用该特性,必须将卷配置为PersistentVolume;不支持直接从pod引用卷。像这样:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv
spec:
capacity:
storage: 200Gi
accessModes:
- ReadWriteMany
gcePersistentDisk:
pdName: my-regional-disk
fsType: ext4
现在尝试弄清楚如何重新配置NFS服务器以使用区域磁盘。