Elasticsearch Pod 就绪探测失败，并显示"消息"： "就绪探测失败"， "curl rc"： "7"

我正在尝试在RKE集群上部署Elasticsearch。

遵循本教程的说明。

https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html

Kube部署在proxy后面的vm上

由于没有提供程序，我自己提供了pv，这不是问题。

我从问题中得到的错误如下:

就绪探测失败:{"timestamp"; "2021-10-06T12:44:37+00:00"， "message"; "就绪探测失败"， "curl_rc"; "7"}

另外，如果我在主节点上旋转，我会得到不同的错误

curl https://127.0.0.0:9200curl:(56)在CONNECT

之后从代理接收到HTTP代码503在容器内我得到Bash-4.4 # curl https://127.0.0.0:9200无法连接到服务器

也在容器内:curl https://0.0.0.0:9200curl:(56)在CONNECT

之后从代理接收到HTTP代码503我确定准备问题在执行curl命令时失败，这是脚本的一部分。/mnt/elastic-internal/脚本/readiness-probe-script.sh

我附加了脚本和pod的内容描述输出:

脚本:

#!/usr/bin/env bash
# fail should be called as a last resort to help the user to understand why the probe failed
function fail {
timestamp=$(date --iso-8601=seconds)
echo "{"timestamp": "${timestamp}", "message": "readiness probe failed", "$1"}" | tee /proc/1/fd/2 2> /dev/null
exit 1
}
labels="/mnt/elastic-internal/downward-api/labels"
version=""
if [[ -f "${labels}" ]]; then
# get Elasticsearch version from the downward API
version=$(grep "elasticsearch.k8s.elastic.co/version" ${labels} | cut -d '=' -f 2)
# remove quotes
version=$(echo "${version}" | tr -d '"')
fi
READINESS_PROBE_TIMEOUT=${READINESS_PROBE_TIMEOUT:=3}
# Check if PROBE_PASSWORD_PATH is set, otherwise fall back to its former name in 1.0.0.beta-1: PROBE_PASSWORD_FILE
if [[ -z "${PROBE_PASSWORD_PATH}" ]]; then
probe_password_path="${PROBE_PASSWORD_FILE}"
else
probe_password_path="${PROBE_PASSWORD_PATH}"
fi
# setup basic auth if credentials are available
if [ -n "${PROBE_USERNAME}" ] && [ -f "${probe_password_path}" ]; then
PROBE_PASSWORD=$(<${probe_password_path})
BASIC_AUTH="-u ${PROBE_USERNAME}:${PROBE_PASSWORD}"
else
BASIC_AUTH=''
fi
# Check if we are using IPv6
if [[ $POD_IP =~ .*:.* ]]; then
LOOPBACK="[::1]"
else
LOOPBACK=127.0.0.1
fi
# request Elasticsearch on /
# we are turning globbing off to allow for unescaped [] in case of IPv6
ENDPOINT="${READINESS_PROBE_PROTOCOL:-https}://${LOOPBACK}:9200/"
status=$(curl -o /dev/null -w "%{http_code}" --max-time ${READINESS_PROBE_TIMEOUT} -XGET -g -s -k ${BASIC_AUTH} $ENDPOINT)
curl_rc=$?
if [[ ${curl_rc} -ne 0 ]]; then
fail ""curl_rc": "${curl_rc}""
fi
# ready if status code 200, 503 is tolerable if ES version is 6.x
if [[ ${status} == "200" ]] || [[ ${status} == "503" && ${version:0:2} == "6." ]]; then
exit 0
else
fail " "status": "${status}", "version":"${version}" "
fi

下面是描述pod的输出:

Name:         quickstart-es-default-0
Namespace:    default
Priority:     0
Node:         rke-worker-1/10.21.242.216
Start Time:   Wed, 06 Oct 2021 14:43:11 +0200
Labels:       common.k8s.elastic.co/type=elasticsearch
controller-revision-hash=quickstart-es-default-666db95c77
elasticsearch.k8s.elastic.co/cluster-name=quickstart
elasticsearch.k8s.elastic.co/config-hash=2374451611
elasticsearch.k8s.elastic.co/http-scheme=https
elasticsearch.k8s.elastic.co/node-data=true
elasticsearch.k8s.elastic.co/node-data_cold=true
elasticsearch.k8s.elastic.co/node-data_content=true
elasticsearch.k8s.elastic.co/node-data_hot=true
elasticsearch.k8s.elastic.co/node-data_warm=true
elasticsearch.k8s.elastic.co/node-ingest=true
elasticsearch.k8s.elastic.co/node-master=true
elasticsearch.k8s.elastic.co/node-ml=true
elasticsearch.k8s.elastic.co/node-remote_cluster_client=true
elasticsearch.k8s.elastic.co/node-transform=true
elasticsearch.k8s.elastic.co/node-voting_only=false
elasticsearch.k8s.elastic.co/statefulset-name=quickstart-es-default
elasticsearch.k8s.elastic.co/version=7.15.0
statefulset.kubernetes.io/pod-name=quickstart-es-default-0
Annotations:  cni.projectcalico.org/containerID: 1e03a07fc3a1cb37902231b69a5f0fcaed2d450137cb675c5dfb393af185a258
cni.projectcalico.org/podIP: 10.42.2.7/32
cni.projectcalico.org/podIPs: 10.42.2.7/32
co.elastic.logs/module: elasticsearch
update.k8s.elastic.co/timestamp: 2021-10-06T12:43:23.93263325Z
Status:       Running
IP:           10.42.2.7
IPs:
IP:           10.42.2.7
Controlled By:  StatefulSet/quickstart-es-default
Init Containers:
elastic-internal-init-filesystem:
Container ID:  docker://cc72c63cb1bb5406a2edbcc0488065c06a130f00a73d2e38544cd7e9754fbc57
Image:         docker.elastic.co/elasticsearch/elasticsearch:7.15.0
Image ID:      docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1
Port:          <none>
Host Port:     <none>
Command:
bash
-c
/mnt/elastic-internal/scripts/prepare-fs.sh
State:          Terminated
Reason:       Completed
Exit Code:    0
Started:      Wed, 06 Oct 2021 14:43:20 +0200
Finished:     Wed, 06 Oct 2021 14:43:42 +0200
Ready:          True
Restart Count:  0
Limits:
cpu:     100m
memory:  50Mi
Requests:
cpu:     100m
memory:  50Mi
Environment:
POD_IP:                  (v1:status.podIP)
POD_NAME:               quickstart-es-default-0 (v1:metadata.name)
NODE_NAME:               (v1:spec.nodeName)
NAMESPACE:              default (v1:metadata.namespace)
HEADLESS_SERVICE_NAME:  quickstart-es-default
Mounts:
/mnt/elastic-internal/downward-api from downward-api (ro)
/mnt/elastic-internal/elasticsearch-bin-local from elastic-internal-elasticsearch-bin-local (rw)
/mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
/mnt/elastic-internal/elasticsearch-config-local from elastic-internal-elasticsearch-config-local (rw)
/mnt/elastic-internal/elasticsearch-plugins-local from elastic-internal-elasticsearch-plugins-local (rw)
/mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
/mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
/mnt/elastic-internal/transport-certificates from elastic-internal-transport-certificates (ro)
/mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
/mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
/usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
/usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
/usr/share/elasticsearch/data from elasticsearch-data (rw)
/usr/share/elasticsearch/logs from elasticsearch-logs (rw)
Containers:
elasticsearch:
Container ID:   docker://9fb879f9f0404a9997b5aa0ae915c788569c85abd008617447422ba5de559b54
Image:          docker.elastic.co/elasticsearch/elasticsearch:7.15.0
Image ID:       docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1
Ports:          9200/TCP, 9300/TCP
Host Ports:     0/TCP, 0/TCP
State:          Running
Started:      Wed, 06 Oct 2021 14:46:26 +0200
Last State:     Terminated
Reason:       Error
Exit Code:    134
Started:      Wed, 06 Oct 2021 14:43:46 +0200
Finished:     Wed, 06 Oct 2021 14:46:22 +0200
Ready:          False
Restart Count:  1
Limits:
memory:  2Gi
Requests:
memory:   2Gi
Readiness:  exec [bash -c /mnt/elastic-internal/scripts/readiness-probe-script.sh] delay=10s timeout=5s period=5s #success=1 #failure=3
Environment:
POD_IP:                     (v1:status.podIP)
POD_NAME:                  quickstart-es-default-0 (v1:metadata.name)
NODE_NAME:                  (v1:spec.nodeName)
NAMESPACE:                 default (v1:metadata.namespace)
PROBE_PASSWORD_PATH:       /mnt/elastic-internal/probe-user/elastic-internal-probe
PROBE_USERNAME:            elastic-internal-probe
READINESS_PROBE_PROTOCOL:  https
HEADLESS_SERVICE_NAME:     quickstart-es-default
NSS_SDB_USE_CACHE:         no
Mounts:
/mnt/elastic-internal/downward-api from downward-api (ro)
/mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
/mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
/mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
/mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
/mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
/usr/share/elasticsearch/bin from elastic-internal-elasticsearch-bin-local (rw)
/usr/share/elasticsearch/config from elastic-internal-elasticsearch-config-local (rw)
/usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
/usr/share/elasticsearch/config/transport-certs from elastic-internal-transport-certificates (ro)
/usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
/usr/share/elasticsearch/data from elasticsearch-data (rw)
/usr/share/elasticsearch/logs from elasticsearch-logs (rw)
/usr/share/elasticsearch/plugins from elastic-internal-elasticsearch-plugins-local (rw)
Conditions:
Type              Status
Initialized       True
Ready             False
ContainersReady   False
PodScheduled      True
Volumes:
elasticsearch-data:
Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName:  elasticsearch-data-quickstart-es-default-0
ReadOnly:   false
downward-api:
Type:  DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
elastic-internal-elasticsearch-bin-local:
Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:  <unset>
elastic-internal-elasticsearch-config:
Type:        Secret (a volume populated by a Secret)
SecretName:  quickstart-es-default-es-config
Optional:    false
elastic-internal-elasticsearch-config-local:
Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:  <unset>
elastic-internal-elasticsearch-plugins-local:
Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:  <unset>
elastic-internal-http-certificates:
Type:        Secret (a volume populated by a Secret)
SecretName:  quickstart-es-http-certs-internal
Optional:    false
elastic-internal-probe-user:
Type:        Secret (a volume populated by a Secret)
SecretName:  quickstart-es-internal-users
Optional:    false
elastic-internal-remote-certificate-authorities:
Type:        Secret (a volume populated by a Secret)
SecretName:  quickstart-es-remote-ca
Optional:    false
elastic-internal-scripts:
Type:      ConfigMap (a volume populated by a ConfigMap)
Name:      quickstart-es-scripts
Optional:  false
elastic-internal-transport-certificates:
Type:        Secret (a volume populated by a Secret)
SecretName:  quickstart-es-default-es-transport-certs
Optional:    false
elastic-internal-unicast-hosts:
Type:      ConfigMap (a volume populated by a ConfigMap)
Name:      quickstart-es-unicast-hosts
Optional:  false
elastic-internal-xpack-file-realm:
Type:        Secret (a volume populated by a Secret)
SecretName:  quickstart-es-xpack-file-realm
Optional:    false
elasticsearch-logs:
Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit:   <unset>
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type     Reason            Age                   From               Message
----     ------            ----                  ----               -------
Warning  FailedScheduling  22m                   default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Normal   Scheduled         22m                   default-scheduler  Successfully assigned default/quickstart-es-default-0 to rke-worker-1
Normal   Pulled            21m                   kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.15.0" already present on machine
Normal   Created           21m                   kubelet            Created container elastic-internal-init-filesystem
Normal   Started           21m                   kubelet            Started container elastic-internal-init-filesystem
Normal   Pulled            21m                   kubelet            Container image "docker.elastic.co/elasticsearch/elasticsearch:7.15.0" already present on machine
Normal   Created           21m                   kubelet            Created container elasticsearch
Normal   Started           21m                   kubelet            Started container elasticsearch
Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:43:57+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:02+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:07+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:12+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning  Unhealthy         21m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:17+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning  Unhealthy         20m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:22+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning  Unhealthy         20m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:27+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning  Unhealthy         20m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:32+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning  Unhealthy         20m                   kubelet            Readiness probe failed: {"timestamp": "2021-10-06T12:44:37+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning  Unhealthy         115s (x223 over 20m)  kubelet            (combined from similar events): Readiness probe failed: {"timestamp": "2021-10-06T13:03:22+00:00", "message": "readiness probe failed", "curl_rc": "7"}

探针重启后，我得到以下输出:

{"type": "deprecation.elasticsearch", "timestamp": "2021-10-07T11:58:28,007Z", "level": "DEPRECATION", "component": "o.e.d.c.r.OperationRouting", "cluster.name": "quickstart", "node.name": "quickstart-es-default-0", "message": "searches will not be routed based on awareness attributes starting in version 8.0.0; to opt into this behaviour now please set the system property [es.search.ignore_awareness_attributes] to [true]", "key": "searches_not_routed_on_awareness_attributes" }
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fc63c3eb122, pid=7, tid=261
#
# JRE version: OpenJDK Runtime Environment Temurin-16.0.2+7 (16.0.2+7) (build 16.0.2+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-16.0.2+7 (16.0.2+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# J 711 c1 org.yaml.snakeyaml.scanner.Constant.has(I)Z (42 bytes) @ 0x00007fc63c3eb122 [0x00007fc63c3eb100+0x0000000000000022]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /usr/share/elasticsearch/core.7)
#
# An error report file with more information is saved as:
# logs/hs_err_pid7.log
Compiled method (c1)  333657 4806       3       org.yaml.snakeyaml.scanner.Constant::hasNo (15 bytes)
total in heap  [0x00007fc63c50c010,0x00007fc63c50c7d0] = 1984
relocation     [0x00007fc63c50c170,0x00007fc63c50c1f8] = 136
main code      [0x00007fc63c50c200,0x00007fc63c50c620] = 1056
stub code      [0x00007fc63c50c620,0x00007fc63c50c680] = 96
oops           [0x00007fc63c50c680,0x00007fc63c50c688] = 8
metadata       [0x00007fc63c50c688,0x00007fc63c50c6a8] = 32
scopes data    [0x00007fc63c50c6a8,0x00007fc63c50c718] = 112
scopes pcs     [0x00007fc63c50c718,0x00007fc63c50c7b8] = 160
dependencies   [0x00007fc63c50c7b8,0x00007fc63c50c7c0] = 8
nul chk table  [0x00007fc63c50c7c0,0x00007fc63c50c7d0] = 16
Compiled method (c1)  333676 4806       3       org.yaml.snakeyaml.scanner.Constant::hasNo (15 bytes)
total in heap  [0x00007fc63c50c010,0x00007fc63c50c7d0] = 1984
relocation     [0x00007fc63c50c170,0x00007fc63c50c1f8] = 136
main code      [0x00007fc63c50c200,0x00007fc63c50c620] = 1056
stub code      [0x00007fc63c50c620,0x00007fc63c50c680] = 96
oops           [0x00007fc63c50c680,0x00007fc63c50c688] = 8
metadata       [0x00007fc63c50c688,0x00007fc63c50c6a8] = 32
scopes data    [0x00007fc63c50c6a8,0x00007fc63c50c718] = 112
scopes pcs     [0x00007fc63c50c718,0x00007fc63c50c7b8] = 160
dependencies   [0x00007fc63c50c7b8,0x00007fc63c50c7c0] = 8
nul chk table  [0x00007fc63c50c7c0,0x00007fc63c50c7d0] = 16
Compiled method (c1)  333678 4812       3       org.yaml.snakeyaml.scanner.ScannerImpl::scanLineBreak (99 bytes)
total in heap  [0x00007fc63c583990,0x00007fc63c584808] = 3704
relocation     [0x00007fc63c583af0,0x00007fc63c583bf8] = 264
main code      [0x00007fc63c583c00,0x00007fc63c584420] = 2080
stub code      [0x00007fc63c584420,0x00007fc63c5844c0] = 160
oops           [0x00007fc63c5844c0,0x00007fc63c5844c8] = 8
metadata       [0x00007fc63c5844c8,0x00007fc63c584500] = 56
scopes data    [0x00007fc63c584500,0x00007fc63c5845f0] = 240
scopes pcs     [0x00007fc63c5845f0,0x00007fc63c5847b0] = 448
dependencies   [0x00007fc63c5847b0,0x00007fc63c5847b8] = 8
nul chk table  [0x00007fc63c5847b8,0x00007fc63c584808] = 80
Compiled method (c1)  333679 4693       2       java.lang.String::indexOf (7 bytes)
total in heap  [0x00007fc63c6e0190,0x00007fc63c6e0568] = 984
relocation     [0x00007fc63c6e02f0,0x00007fc63c6e0338] = 72
main code      [0x00007fc63c6e0340,0x00007fc63c6e0480] = 320
stub code      [0x00007fc63c6e0480,0x00007fc63c6e04d0] = 80
metadata       [0x00007fc63c6e04d0,0x00007fc63c6e04e0] = 16
scopes data    [0x00007fc63c6e04e0,0x00007fc63c6e0510] = 48
scopes pcs     [0x00007fc63c6e0510,0x00007fc63c6e0560] = 80
dependencies   [0x00007fc63c6e0560,0x00007fc63c6e0568] = 8
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

解决我的问题是如此简单，以至于我没有表现。
我把问题缩小到TLS握手失败。
节点上的次数是不同的。我同步了所有节点的时间和日期，所有问题都消失了。
这是由于这种差异。
代理阻止NTP等服务同步时间

NAME                      READY   STATUS    RESTARTS   AGE
quickstart-es-default-0   1/1     Running   0          3m2s
quickstart-es-default-1   1/1     Running   0          3m2s
quickstart-es-default-2   1/1     Running   0          3m2s
kubectl get elasticsearch
NAME         HEALTH   NODES   VERSION   PHASE   AGE
quickstart   green    3       7.15.0    Ready   3m21s

相关内容

最新更新

热门标签：