elasticsearch:如何重新初始化节点

在CentOS 上弹性搜索1.7.2

我们有一个运行良好的3节点集群。网络问题导致"B"节点失去网络访问权限。（然后发现C节点的"minimum_master_nodes"为1，而不是2。）

所以我们现在只处理A节点。

我们修复了B和C节点上的问题，但它们拒绝加入集群。关于B和C：

# curl -XGET http://localhost:9200/_cluster/health?pretty=true
{
  "error" : "MasterNotDiscoveredException[waited for [30s]]",
  "status" : 503
}

elasticsearch.yml如下（"b"one_answers"c"节点上的名称反映在这些系统上的节点名称中，此外，每个节点上的IP地址反映其他2个节点，但是，在"c"上，index.number_of_recuries被错误地设置为1。）

cluster.name: elasticsearch-prod
node.name: "PROD-node-3a"
node.master: true
index.number_of_replicas: 2
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.3.100", "192.168.3.101"]

我们不知道他们为什么不加入。它们对A具有网络可见性，并且A可以看到它们。每个节点都正确地在"discovery.zen.ping.unicast.hosts:"中定义了另外两个节点

在B和C上，日志非常稀疏，告诉我们什么都没有：

    # cat elasticsearch.log
[2015-09-24 20:07:46,686][INFO ][node                     ] [The Profile] version[1.7.2], pid[866], build[e43676b/2015-09-14T09:49:53Z]
[2015-09-24 20:07:46,688][INFO ][node                     ] [The Profile] initializing ...
[2015-09-24 20:07:46,931][INFO ][plugins                  ] [The Profile] loaded [], sites []
[2015-09-24 20:07:47,054][INFO ][env                      ] [The Profile] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [148.7gb], net total_space [157.3gb], types [rootfs]
[2015-09-24 20:07:50,696][INFO ][node                     ] [The Profile] initialized
[2015-09-24 20:07:50,697][INFO ][node                     ] [The Profile] starting ...
[2015-09-24 20:07:50,942][INFO ][transport                ] [The Profile] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.181.3.138:9300]}
[2015-09-24 20:07:50,983][INFO ][discovery                ] [The Profile] elasticsearch/PojoIp-ZTXufX_Lxlwvdew
[2015-09-24 20:07:54,772][INFO ][cluster.service          ] [The Profile] new_master [The Profile][PojoIp-ZTXufX_Lxlwvdew][elastic-search-3c-prod-centos-case-48307][inet[/10.181.3.138:9300]], reason: zen-disco-join (elected_as_master)
[2015-09-24 20:07:54,801][INFO ][http                     ] [The Profile] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.181.3.138:9200]}
[2015-09-24 20:07:54,802][INFO ][node                     ] [The Profile] started
[2015-09-24 20:07:54,880][INFO ][gateway                  ] [The Profile] recovered [0] indices into cluster_state
[2015-09-24 20:42:45,691][INFO ][node                     ] [The Profile] stopping ...
[2015-09-24 20:42:45,727][INFO ][node                     ] [The Profile] stopped
[2015-09-24 20:42:45,727][INFO ][node                     ] [The Profile] closing ...
[2015-09-24 20:42:45,735][INFO ][node                     ] [The Profile] closed

我们如何让整个野兽复活？

重新启动B和C没有任何区别
我对循环A犹豫不决，因为这正是我们的应用程序所达到的

我们不知道是什么让它复活了，但它有点神奇地复活了。

我相信shard重新路由（如图所示：elasticsearch：当我的三个节点中有两个发生故障时，我是否丢失了数据？）导致节点重新加入集群。我们的理论是，节点A，唯一幸存的节点，不是一个"健康"的主节点，因为它知道没有分配一个碎片（碎片1的"p"切割，如这里所示：弹性搜索：当我的三个节点中有两个坏掉时，我丢失了数据吗？）。

由于主节点知道它没有完好无损，其他节点拒绝加入集群，抛出"MasterNotDiscoveredException"

一旦我们将所有"p"碎片分配给幸存的A节点，其他节点就会联合起来，完成整个复制舞蹈。

但是，这样分配碎片会丢失数据我们最终建立了一个新的集群，并正在重建索引（这需要几天时间）。

相关内容

最新更新

热门标签：