我有一个2个节点的集群。
我正在尝试理解连接节点的最佳实践,并在一个节点停机时检查故障转移。
从文档:es = Elasticsearch(
['esnode1', 'esnode2'],
# sniff before doing anything
sniff_on_start=True,
# refresh nodes after a node fails to respond
sniff_on_connection_fail=True,
# and also every 60 seconds
sniffer_timeout=60
)
所以我尝试像这样连接到我的节点:
client = Elasticsearch([ip1, ip2],sniff_on_start=True, sniffer_timeout=10,sniff_on_connection_fail=True)
其中ip1/ip2为机器ip(例如10.0.0.1,10.0.0.2)
为了测试它,我终止了ip2(或者不存在if)现在,当我试图连接时,我总是得到:
TransportError: TransportError(N/A, 'Unable to sniff hosts - no viable hosts found.')
即使ip1是存在的,
如果我想这样连接:
es = Elasticsearch([ip1, ip2])
然后我可以在日志中看到,如果客户端没有从ip2得到任何响应,它将移动到ip1,并返回有效的响应。
我在这里错过了什么吗?我认为通过嗅探,如果其中一个节点关闭,客户端不会抛出任何异常,并继续与活动节点一起工作(直到下一次嗅探)
:当我将嗅探设置为"True"时,我会得到这种行为:
----> 1 client = Elasticsearch([ip1, ip2],sniff_on_start=True)
/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.pyc in __init__(self, hosts, transport_class, **kwargs)
148 :class:`~elasticsearch.Connection` instances.
149 """
--> 150 self.transport = transport_class(_normalize_hosts(hosts), **kwargs)
151
152 # namespaced clients for compatibility with API names
/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.pyc in __init__(self, hosts, connection_class, connection_pool_class, host_info_callback, sniff_on_start, sniffer_timeout, sniff_timeout, sniff_on_connection_fail, serializer, serializers, default_mimetype, max_retries, retry_on_status, retry_on_timeout, send_get_body_as, **kwargs)
128
129 if sniff_on_start:
--> 130 self.sniff_hosts(True)
131
132 def add_connection(self, host):
/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.pyc in sniff_hosts(self, initial)
235 # transport_schema or host_info_callback blocked all - raise error.
236 if not hosts:
--> 237 raise TransportError("N/A", "Unable to sniff hosts - no viable hosts found.")
238
239 self.set_connections(hosts)
您需要将sniff_timeout
设置为比默认值(如果内存可用,则为0.1)更高的值。
像这样试试
es = Elasticsearch(
['esnode1', 'esnode2'],
# sniff before doing anything
sniff_on_start=True,
# refresh nodes after a node fails to respond
sniff_on_connection_fail=True,
# and also every 60 seconds
sniffer_timeout=60,
# set sniffing request timeout to 10 seconds
sniff_timeout=10
)