Kubernetes Pods Crashing上的蟑螂数据库集群



我正试图使用以下命令在一个2节点的Kubernetes集群上安装一个蟑螂DB Helm图表:

helm install my-release --set statefulset.replicas=2 stable/cockroachdb

我已经创建了2个持久卷:

NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                          STORAGECLASS   REASON   AGE
pv00001   100Gi      RWO            Recycle          Bound    default/datadir-my-release-cockroachdb-0                           11m
pv00002   100Gi      RWO            Recycle          Bound    default/datadir-my-release-cockroachdb-1                           11m

我遇到了一个奇怪的错误,而且我是Kubernetes的新手,所以我不确定我做错了什么。我曾尝试创建一个StorageClass并将其与PV一起使用,但CockroachDB-PVC无法与它们绑定。我怀疑我的PV设置可能有问题?

我试过使用kubectl logs,但我看到的唯一错误是:

standard_init_linux.go:211:exec用户进程导致"exec格式"错误">

吊舱一次又一次地崩溃:

NAME                                    READY   STATUS             RESTARTS   AGE
my-release-cockroachdb-0            0/1     Pending            0          11m
my-release-cockroachdb-1            0/1     CrashLoopBackOff   7          11m
my-release-cockroachdb-init-tfcks   0/1     CrashLoopBackOff   5          5m29s

知道吊舱为什么会坠毁吗?

这是init吊舱的kubectl describe

Name:         my-release-cockroachdb-init-tfcks
Namespace:    default
Priority:     0
Node:         axon/192.168.1.7
Start Time:   Sat, 04 Apr 2020 00:22:19 +0100
Labels:       app.kubernetes.io/component=init
app.kubernetes.io/instance=my-release
app.kubernetes.io/name=cockroachdb
controller-uid=54c7c15d-eb1c-4392-930a-d9b8e9225a45
job-name=my-release-cockroachdb-init
Annotations:  <none>
Status:       Running
IP:           10.44.0.1
IPs:
IP:           10.44.0.1
Controlled By:  Job/my-release-cockroachdb-init
Containers:
cluster-init:
Container ID:  docker://82a062c6862a9fd5047236feafe6e2654ec1f6e3064fd0513341a1e7f36eaed3
Image:         cockroachdb/cockroach:v19.2.4
Image ID:      docker-pullable://cockroachdb/cockroach@sha256:511b6d09d5bc42c7566477811a4e774d85d5689f8ba7a87a114b96d115b6149b
Port:          <none>
Host Port:     <none>
Command:
/bin/bash
-c
while true; do initOUT=$(set -x; /cockroach/cockroach init --insecure --host=my-release-cockroachdb-0.my-release-cockroachdb:26257 2>&1); initRC="$?"; echo $initOUT; [[ "$initRC" == "0" ]] && exit 0; [[ "$initOUT" == *"cluster has already been initialized"* ]] && exit 0; sleep 5; done
State:          Waiting
Reason:       CrashLoopBackOff
Last State:     Terminated
Reason:       Error
Exit Code:    1
Started:      Sat, 04 Apr 2020 00:28:04 +0100
Finished:     Sat, 04 Apr 2020 00:28:04 +0100
Ready:          False
Restart Count:  6
Environment:    <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-cz2sn (ro)
Conditions:
Type              Status
Initialized       True 
Ready             False 
ContainersReady   False 
PodScheduled      True 
Volumes:
default-token-cz2sn:
Type:        Secret (a volume populated by a Secret)
SecretName:  default-token-cz2sn
Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type     Reason     Age                   From               Message
----     ------     ----                  ----               -------
Normal   Scheduled  <unknown>             default-scheduler  Successfully assigned default/my-release-cockroachdb-init-tfcks to axon
Normal   Pulled     5m9s (x5 over 6m45s)  kubelet, axon      Container image "cockroachdb/cockroach:v19.2.4" already present on machine
Normal   Created    5m8s (x5 over 6m45s)  kubelet, axon      Created container cluster-init
Normal   Started    5m8s (x5 over 6m44s)  kubelet, axon      Started container cluster-init
Warning  BackOff    92s (x26 over 6m42s)  kubelet, axon      Back-off restarting failed container

当Pods崩溃时,最重要的故障排除是它们的描述(kubectl describe(和日志。

失败Pod的日志显示蟑螂图像的拱门与节点不匹配。

运行kubectl get po -o wide以获取蟑螂运行的节点并检查它们的足弓。

一个2节点的蟑螂数据库集群是一个反模式。当单个节点出现故障时,您需要3个或更多节点来避免数据或集群范围内的不可用性。请考虑查看这些视频,解释蟑螂数据库中的数据是如何组织的,以及集群中的节点如何在节点故障时协同工作以保持数据可用。

只有当您有3个节点(或更多(时,如果任何注释被损坏,您不会有丢失数据的风险。除此之外,解释如何做对比找出问题更容易,而且要找出问题所在,必须查看日志。

如果你附上日志,我可以看一看。

我还写了一份详细的指南;做对了";我回答的一部分。我在这里详细阐述了整个过程。

最新更新