我正在尝试在RKE集群上部署Elasticsearch。
遵循本教程的说明。
https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html
Kube部署在proxy后面的vm上
由于没有提供程序,我自己提供了pv,这不是问题。
我从问题中得到的错误如下:
就绪探测失败:{"timestamp"; "2021-10-06T12:44:37+00:00", "message"; "就绪探测失败", "curl_rc"; "7"}
另外,如果我在主节点上旋转,我会得到不同的错误
curl https://127.0.0.0:9200curl:(56)在CONNECT
之后从代理接收到HTTP代码503在容器内我得到Bash-4.4 # curl https://127.0.0.0:9200无法连接到服务器
也在容器内:curl https://0.0.0.0:9200curl:(56)在CONNECT
之后从代理接收到HTTP代码503我确定准备问题在执行curl命令时失败,这是脚本的一部分。/mnt/elastic-internal/脚本/readiness-probe-script.sh
我附加了脚本和pod的内容描述输出:
脚本:
#!/usr/bin/env bash
# fail should be called as a last resort to help the user to understand why the probe failed
function fail {
timestamp=$(date --iso-8601=seconds)
echo "{"timestamp": "${timestamp}", "message": "readiness probe failed", "$1"}" | tee /proc/1/fd/2 2> /dev/null
exit 1
}
labels="/mnt/elastic-internal/downward-api/labels"
version=""
if [[ -f "${labels}" ]]; then
# get Elasticsearch version from the downward API
version=$(grep "elasticsearch.k8s.elastic.co/version" ${labels} | cut -d '=' -f 2)
# remove quotes
version=$(echo "${version}" | tr -d '"')
fi
READINESS_PROBE_TIMEOUT=${READINESS_PROBE_TIMEOUT:=3}
# Check if PROBE_PASSWORD_PATH is set, otherwise fall back to its former name in 1.0.0.beta-1: PROBE_PASSWORD_FILE
if [[ -z "${PROBE_PASSWORD_PATH}" ]]; then
probe_password_path="${PROBE_PASSWORD_FILE}"
else
probe_password_path="${PROBE_PASSWORD_PATH}"
fi
# setup basic auth if credentials are available
if [ -n "${PROBE_USERNAME}" ] && [ -f "${probe_password_path}" ]; then
PROBE_PASSWORD=$(<${probe_password_path})
BASIC_AUTH="-u ${PROBE_USERNAME}:${PROBE_PASSWORD}"
else
BASIC_AUTH=''
fi
# Check if we are using IPv6
if [[ $POD_IP =~ .*:.* ]]; then
LOOPBACK="[::1]"
else
LOOPBACK=127.0.0.1
fi
# request Elasticsearch on /
# we are turning globbing off to allow for unescaped [] in case of IPv6
ENDPOINT="${READINESS_PROBE_PROTOCOL:-https}://${LOOPBACK}:9200/"
status=$(curl -o /dev/null -w "%{http_code}" --max-time ${READINESS_PROBE_TIMEOUT} -XGET -g -s -k ${BASIC_AUTH} $ENDPOINT)
curl_rc=$?
if [[ ${curl_rc} -ne 0 ]]; then
fail ""curl_rc": "${curl_rc}""
fi
# ready if status code 200, 503 is tolerable if ES version is 6.x
if [[ ${status} == "200" ]] || [[ ${status} == "503" && ${version:0:2} == "6." ]]; then
exit 0
else
fail " "status": "${status}", "version":"${version}" "
fi
下面是描述pod的输出:
Name: quickstart-es-default-0
Namespace: default
Priority: 0
Node: rke-worker-1/10.21.242.216
Start Time: Wed, 06 Oct 2021 14:43:11 +0200
Labels: common.k8s.elastic.co/type=elasticsearch
controller-revision-hash=quickstart-es-default-666db95c77
elasticsearch.k8s.elastic.co/cluster-name=quickstart
elasticsearch.k8s.elastic.co/config-hash=2374451611
elasticsearch.k8s.elastic.co/http-scheme=https
elasticsearch.k8s.elastic.co/node-data=true
elasticsearch.k8s.elastic.co/node-data_cold=true
elasticsearch.k8s.elastic.co/node-data_content=true
elasticsearch.k8s.elastic.co/node-data_hot=true
elasticsearch.k8s.elastic.co/node-data_warm=true
elasticsearch.k8s.elastic.co/node-ingest=true
elasticsearch.k8s.elastic.co/node-master=true
elasticsearch.k8s.elastic.co/node-ml=true
elasticsearch.k8s.elastic.co/node-remote_cluster_client=true
elasticsearch.k8s.elastic.co/node-transform=true
elasticsearch.k8s.elastic.co/node-voting_only=false
elasticsearch.k8s.elastic.co/statefulset-name=quickstart-es-default
elasticsearch.k8s.elastic.co/version=7.15.0
statefulset.kubernetes.io/pod-name=quickstart-es-default-0
Annotations: cni.projectcalico.org/containerID: 1e03a07fc3a1cb37902231b69a5f0fcaed2d450137cb675c5dfb393af185a258
cni.projectcalico.org/podIP: 10.42.2.7/32
cni.projectcalico.org/podIPs: 10.42.2.7/32
co.elastic.logs/module: elasticsearch
update.k8s.elastic.co/timestamp: 2021-10-06T12:43:23.93263325Z
Status: Running
IP: 10.42.2.7
IPs:
IP: 10.42.2.7
Controlled By: StatefulSet/quickstart-es-default
Init Containers:
elastic-internal-init-filesystem:
Container ID: docker://cc72c63cb1bb5406a2edbcc0488065c06a130f00a73d2e38544cd7e9754fbc57
Image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1
Port: <none>
Host Port: <none>
Command:
bash
-c
/mnt/elastic-internal/scripts/prepare-fs.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 06 Oct 2021 14:43:20 +0200
Finished: Wed, 06 Oct 2021 14:43:42 +0200
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_IP: (v1:status.podIP)
POD_NAME: quickstart-es-default-0 (v1:metadata.name)
NODE_NAME: (v1:spec.nodeName)
NAMESPACE: default (v1:metadata.namespace)
HEADLESS_SERVICE_NAME: quickstart-es-default
Mounts:
/mnt/elastic-internal/downward-api from downward-api (ro)
/mnt/elastic-internal/elasticsearch-bin-local from elastic-internal-elasticsearch-bin-local (rw)
/mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
/mnt/elastic-internal/elasticsearch-config-local from elastic-internal-elasticsearch-config-local (rw)
/mnt/elastic-internal/elasticsearch-plugins-local from elastic-internal-elasticsearch-plugins-local (rw)
/mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
/mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
/mnt/elastic-internal/transport-certificates from elastic-internal-transport-certificates (ro)
/mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
/mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
/usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
/usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
/usr/share/elasticsearch/data from elasticsearch-data (rw)
/usr/share/elasticsearch/logs from elasticsearch-logs (rw)
Containers:
elasticsearch:
Container ID: docker://9fb879f9f0404a9997b5aa0ae915c788569c85abd008617447422ba5de559b54
Image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
Image ID: docker-pullable://docker.elastic.co/elasticsearch/elasticsearch@sha256:6ae227c688e05f7d487e0cfe08a5a3681f4d60d006ad9b5a1f72a741d6091df1
Ports: 9200/TCP, 9300/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Wed, 06 Oct 2021 14:46:26 +0200
Last State: Terminated
Reason: Error
Exit Code: 134
Started: Wed, 06 Oct 2021 14:43:46 +0200
Finished: Wed, 06 Oct 2021 14:46:22 +0200
Ready: False
Restart Count: 1
Limits:
memory: 2Gi
Requests:
memory: 2Gi
Readiness: exec [bash -c /mnt/elastic-internal/scripts/readiness-probe-script.sh] delay=10s timeout=5s period=5s #success=1 #failure=3
Environment:
POD_IP: (v1:status.podIP)
POD_NAME: quickstart-es-default-0 (v1:metadata.name)
NODE_NAME: (v1:spec.nodeName)
NAMESPACE: default (v1:metadata.namespace)
PROBE_PASSWORD_PATH: /mnt/elastic-internal/probe-user/elastic-internal-probe
PROBE_USERNAME: elastic-internal-probe
READINESS_PROBE_PROTOCOL: https
HEADLESS_SERVICE_NAME: quickstart-es-default
NSS_SDB_USE_CACHE: no
Mounts:
/mnt/elastic-internal/downward-api from downward-api (ro)
/mnt/elastic-internal/elasticsearch-config from elastic-internal-elasticsearch-config (ro)
/mnt/elastic-internal/probe-user from elastic-internal-probe-user (ro)
/mnt/elastic-internal/scripts from elastic-internal-scripts (ro)
/mnt/elastic-internal/unicast-hosts from elastic-internal-unicast-hosts (ro)
/mnt/elastic-internal/xpack-file-realm from elastic-internal-xpack-file-realm (ro)
/usr/share/elasticsearch/bin from elastic-internal-elasticsearch-bin-local (rw)
/usr/share/elasticsearch/config from elastic-internal-elasticsearch-config-local (rw)
/usr/share/elasticsearch/config/http-certs from elastic-internal-http-certificates (ro)
/usr/share/elasticsearch/config/transport-certs from elastic-internal-transport-certificates (ro)
/usr/share/elasticsearch/config/transport-remote-certs/ from elastic-internal-remote-certificate-authorities (ro)
/usr/share/elasticsearch/data from elasticsearch-data (rw)
/usr/share/elasticsearch/logs from elasticsearch-logs (rw)
/usr/share/elasticsearch/plugins from elastic-internal-elasticsearch-plugins-local (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
elasticsearch-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: elasticsearch-data-quickstart-es-default-0
ReadOnly: false
downward-api:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
elastic-internal-elasticsearch-bin-local:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elastic-internal-elasticsearch-config:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-default-es-config
Optional: false
elastic-internal-elasticsearch-config-local:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elastic-internal-elasticsearch-plugins-local:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
elastic-internal-http-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-http-certs-internal
Optional: false
elastic-internal-probe-user:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-internal-users
Optional: false
elastic-internal-remote-certificate-authorities:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-remote-ca
Optional: false
elastic-internal-scripts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: quickstart-es-scripts
Optional: false
elastic-internal-transport-certificates:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-default-es-transport-certs
Optional: false
elastic-internal-unicast-hosts:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: quickstart-es-unicast-hosts
Optional: false
elastic-internal-xpack-file-realm:
Type: Secret (a volume populated by a Secret)
SecretName: quickstart-es-xpack-file-realm
Optional: false
elasticsearch-logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 22m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 22m default-scheduler Successfully assigned default/quickstart-es-default-0 to rke-worker-1
Normal Pulled 21m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.15.0" already present on machine
Normal Created 21m kubelet Created container elastic-internal-init-filesystem
Normal Started 21m kubelet Started container elastic-internal-init-filesystem
Normal Pulled 21m kubelet Container image "docker.elastic.co/elasticsearch/elasticsearch:7.15.0" already present on machine
Normal Created 21m kubelet Created container elasticsearch
Normal Started 21m kubelet Started container elasticsearch
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:43:57+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:02+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:07+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:12+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 21m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:17+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 20m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:22+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 20m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:27+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 20m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:32+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 20m kubelet Readiness probe failed: {"timestamp": "2021-10-06T12:44:37+00:00", "message": "readiness probe failed", "curl_rc": "7"}
Warning Unhealthy 115s (x223 over 20m) kubelet (combined from similar events): Readiness probe failed: {"timestamp": "2021-10-06T13:03:22+00:00", "message": "readiness probe failed", "curl_rc": "7"}
探针重启后,我得到以下输出:
{"type": "deprecation.elasticsearch", "timestamp": "2021-10-07T11:58:28,007Z", "level": "DEPRECATION", "component": "o.e.d.c.r.OperationRouting", "cluster.name": "quickstart", "node.name": "quickstart-es-default-0", "message": "searches will not be routed based on awareness attributes starting in version 8.0.0; to opt into this behaviour now please set the system property [es.search.ignore_awareness_attributes] to [true]", "key": "searches_not_routed_on_awareness_attributes" }
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fc63c3eb122, pid=7, tid=261
#
# JRE version: OpenJDK Runtime Environment Temurin-16.0.2+7 (16.0.2+7) (build 16.0.2+7)
# Java VM: OpenJDK 64-Bit Server VM Temurin-16.0.2+7 (16.0.2+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# J 711 c1 org.yaml.snakeyaml.scanner.Constant.has(I)Z (42 bytes) @ 0x00007fc63c3eb122 [0x00007fc63c3eb100+0x0000000000000022]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /usr/share/elasticsearch/core.7)
#
# An error report file with more information is saved as:
# logs/hs_err_pid7.log
Compiled method (c1) 333657 4806 3 org.yaml.snakeyaml.scanner.Constant::hasNo (15 bytes)
total in heap [0x00007fc63c50c010,0x00007fc63c50c7d0] = 1984
relocation [0x00007fc63c50c170,0x00007fc63c50c1f8] = 136
main code [0x00007fc63c50c200,0x00007fc63c50c620] = 1056
stub code [0x00007fc63c50c620,0x00007fc63c50c680] = 96
oops [0x00007fc63c50c680,0x00007fc63c50c688] = 8
metadata [0x00007fc63c50c688,0x00007fc63c50c6a8] = 32
scopes data [0x00007fc63c50c6a8,0x00007fc63c50c718] = 112
scopes pcs [0x00007fc63c50c718,0x00007fc63c50c7b8] = 160
dependencies [0x00007fc63c50c7b8,0x00007fc63c50c7c0] = 8
nul chk table [0x00007fc63c50c7c0,0x00007fc63c50c7d0] = 16
Compiled method (c1) 333676 4806 3 org.yaml.snakeyaml.scanner.Constant::hasNo (15 bytes)
total in heap [0x00007fc63c50c010,0x00007fc63c50c7d0] = 1984
relocation [0x00007fc63c50c170,0x00007fc63c50c1f8] = 136
main code [0x00007fc63c50c200,0x00007fc63c50c620] = 1056
stub code [0x00007fc63c50c620,0x00007fc63c50c680] = 96
oops [0x00007fc63c50c680,0x00007fc63c50c688] = 8
metadata [0x00007fc63c50c688,0x00007fc63c50c6a8] = 32
scopes data [0x00007fc63c50c6a8,0x00007fc63c50c718] = 112
scopes pcs [0x00007fc63c50c718,0x00007fc63c50c7b8] = 160
dependencies [0x00007fc63c50c7b8,0x00007fc63c50c7c0] = 8
nul chk table [0x00007fc63c50c7c0,0x00007fc63c50c7d0] = 16
Compiled method (c1) 333678 4812 3 org.yaml.snakeyaml.scanner.ScannerImpl::scanLineBreak (99 bytes)
total in heap [0x00007fc63c583990,0x00007fc63c584808] = 3704
relocation [0x00007fc63c583af0,0x00007fc63c583bf8] = 264
main code [0x00007fc63c583c00,0x00007fc63c584420] = 2080
stub code [0x00007fc63c584420,0x00007fc63c5844c0] = 160
oops [0x00007fc63c5844c0,0x00007fc63c5844c8] = 8
metadata [0x00007fc63c5844c8,0x00007fc63c584500] = 56
scopes data [0x00007fc63c584500,0x00007fc63c5845f0] = 240
scopes pcs [0x00007fc63c5845f0,0x00007fc63c5847b0] = 448
dependencies [0x00007fc63c5847b0,0x00007fc63c5847b8] = 8
nul chk table [0x00007fc63c5847b8,0x00007fc63c584808] = 80
Compiled method (c1) 333679 4693 2 java.lang.String::indexOf (7 bytes)
total in heap [0x00007fc63c6e0190,0x00007fc63c6e0568] = 984
relocation [0x00007fc63c6e02f0,0x00007fc63c6e0338] = 72
main code [0x00007fc63c6e0340,0x00007fc63c6e0480] = 320
stub code [0x00007fc63c6e0480,0x00007fc63c6e04d0] = 80
metadata [0x00007fc63c6e04d0,0x00007fc63c6e04e0] = 16
scopes data [0x00007fc63c6e04e0,0x00007fc63c6e0510] = 48
scopes pcs [0x00007fc63c6e0510,0x00007fc63c6e0560] = 80
dependencies [0x00007fc63c6e0560,0x00007fc63c6e0568] = 8
#
# If you would like to submit a bug report, please visit:
# https://github.com/adoptium/adoptium-support/issues
#
解决我的问题是如此简单,以至于我没有表现。
我把问题缩小到TLS握手失败。
节点上的次数是不同的。我同步了所有节点的时间和日期,所有问题都消失了。
这是由于这种差异。
代理阻止NTP等服务同步时间
NAME READY STATUS RESTARTS AGE
quickstart-es-default-0 1/1 Running 0 3m2s
quickstart-es-default-1 1/1 Running 0 3m2s
quickstart-es-default-2 1/1 Running 0 3m2s
kubectl get elasticsearch
NAME HEALTH NODES VERSION PHASE AGE
quickstart green 3 7.15.0 Ready 3m21s