kubernetes如何为附加卷的有状态应用程序提供HA

我无法将我的有状态应用程序配置为能够抵御kubernetes工作程序故障(我的应用程序pod所在的故障(

$ kk get pod -owide
NAME                                READY   STATUS    RESTARTS   AGE     IP                NODE               NOMINATED NODE   READINESS GATES
example-openebs-97767f45f-xbwp6     1/1     Running   0          6m21s   192.168.207.233   new-kube-worker1   <none>           <none>

一旦我取下工人，kubernetes就会注意到pod没有响应，并将其安排给另一个工人。

marek649@new-kube-master:~$ kk get pod -owide
NAME                                READY   STATUS              RESTARTS   AGE   IP                NODE               NOMINATED NODE   READINESS GATES
example-openebs-97767f45f-gct5b     0/1     ContainerCreating   0          22s   <none>            new-kube-worker2   <none>           <none>
example-openebs-97767f45f-xbwp6     1/1     Terminating         0          13m   192.168.207.233   new-kube-worker1   <none>           <none>

这很好，但新容器无法启动，因为它试图连接旧容器使用的相同pvc，并且kubernetes不会释放到旧节点的绑定(没有响应(。

$ kk describe pod example-openebs-97767f45f-gct5b
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/example-openebs-97767f45f
Containers:
example-openebs:
Container ID:   
Image:          nginx
Image ID:       
Port:           80/TCP
Host Port:      0/TCP
State:          Waiting
Reason:       ContainerCreating
Ready:          False
Restart Count:  0
Environment:    <none>
Mounts:
/usr/share/nginx/html from demo-claim (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4xmvf (ro)
Conditions:
Type              Status
Initialized       True 
Ready             False 
ContainersReady   False 
PodScheduled      True 
Volumes:
demo-claim:
Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName:  example-pvc
ReadOnly:   false
default-token-4xmvf:
Type:        Secret (a volume populated by a Secret)
SecretName:  default-token-4xmvf
Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type     Reason              Age   From                       Message
----     ------              ----  ----                       -------
Normal   Scheduled           2m9s  default-scheduler          Successfully assigned default/example-openebs-97767f45f-gct5b to new-kube-worker2
Warning  FailedAttachVolume  2m9s  attachdetach-controller    Multi-Attach error for volume "pvc-911f94a9-b43a-4cac-be94-838b0e7376e8" Volume is already used by pod(s) example-openebs-97767f45f-xbw
p6
Warning  FailedMount         6s    kubelet, new-kube-worker2  Unable to attach or mount volumes: unmounted volumes=[demo-claim], unattached volumes=[demo-claim default-token-4xmvf]: timed out waiti
ng for the condition

我可以通过手动强制删除容器、取消PV绑定和重新创建容器来解决这种情况，但这远不是我所期望的高可用性。

我使用的是openEBS jiva卷，在手动干预后，我能够在PV上恢复具有正确数据的容器，这意味着数据可以正确复制到其他节点。

有人能解释一下我做错了什么，以及如何实现附加卷的k8s应用程序的容错吗？

我发现这是相关的，但我没有；我看不到任何解决这个问题的建议https://github.com/openebs/openebs/issues/2536

它最终会释放卷，通常限制因素是网络存储系统检测卷卸载的速度较慢。但你说得对，这是一个限制。通常的修复方法是使用支持多装载的卷类型，如NFS或CephFS。

要部署有状态的应用程序，kubernetes具有Statefulset对象，在这种情况下可能会对您有所帮助。

StatefulSets对于需要以下一项或多项的应用程序很有价值。

稳定、唯一的网络标识符
稳定、持久的存储
有序、优雅的部署和扩展
有序、自动化的滚动更新

对于非托管的Kubernetes集群，这是一个适用于所有类型RWO卷的难题。

Kubernetes社区对此进行了多次讨论，总结如下：

https://github.com/kubernetes/enhancements/pull/1116
https://github.com/kubernetes/kubernetes/issues/86281
https://github.com/kubernetes/kubernetes/issues/53059

当前的思维过程是在NodeTolerations的帮助下提出解决方案，并通过CSI驱动程序实现该解决方案。

在openebs，当我们研究云提供商如何处理这种情况时，我们发现当一个节点关闭时，它对应的节点对象会从集群中删除。此操作不会造成任何危害，因为当节点重新联机时，会重新创建节点对象。

相关内容

最新更新

热门标签：