当主机1出现故障时,如何使用组中的第二台主机



我有一个库存,有500多个组,每个组有两个主机,一个是主主机,另一个是辅助主机。在我的剧本中,我做了两件事。一种是获取无法访问的主机列表,另一种是在主服务器中执行命令。

我的库存示例:(像这样,我有500+组来自set1…set500(

[set1]
host1set1 setup=primary
host2set1 setup=secondary
[set1:vars]
setidentifier=set1

我能够收集不可访问的主机列表,并在主服务器中执行命令现在,我想知道只有当我的主服务器无法访问时,如何在辅助服务器中执行命令

- delegate_to: localhost
command: ping -c1 "{{ hostvars[inventory_hostname].ansible_host|default(inventory_hostname) }}"
register: ping
ignore_errors: true
become: false
- set_fact:
available: "{{ ping.rc == 0 }}"
- lineinfile:
dest: "/tmp/available.txt"
line: "{{ hostvars[inventory_hostname]. }} : {{ inventory_hostname }}"
regexp: "Host: {{ inventory_hostname }}"
create: true
delegate_to: localhost
become: false
when: "{{ hostvars[inventory_hostname].available }} == False"
- shell: date
register: dateout
when: "setup is search('primary')"

只有当我的主服务器无法访问时,我才需要在辅助服务器中执行shell任务(我提到date命令只是为了参考,我还有其他事情要做。(。

有人能在这里放点灯吗?我该怎么做?

首先Ping组。无法访问的主机将被排除在未来的播放之外:

这是我的库存:

[groupA]
AnsibleTower ansible_host=192.168.124.8
[groupB]
jaxsat ansible_host=192.168.124.111
rhel7.5 ansible_host=192.168.124.4

现在,这是没有ping的剧本:

---
- hosts: groupB
gather_facts: no
connection: ssh
tasks:
- name: Run hostname command
command: /bin/hostname
register: result
run_once: yes 
- name: Show result
debug:
var: result
run_once: yes 

由于jaxsat关闭而失败:

$ ansible-playbook -i inventory/ test_one_only.yml
PLAY [groupB] **********************************************************************************
TASK [Run hostname command] ********************************************************************
Tuesday 21 July 2020  13:40:38 -0400 (0:00:00.064)       0:00:00.064 ********** 
fatal: [jaxsat]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.124.111 port 22: No route to hostrn", "unreachable": true}
[WARNING]: Failure using method (v2_runner_on_unreachable) in callback plugin
(<ansible.plugins.callback.mail.CallbackModule object at 0x7f95688fbf10>): [Errno 111]
Connection refused

NO MORE HOSTS LEFT *****************************************************************************
to retry, use: --limit @/home/jack/Ansible/TEST/test_one_only.retry
PLAY RECAP *************************************************************************************
jaxsat                     : ok=0    changed=0    unreachable=1    failed=0   
Tuesday 21 July 2020  13:40:42 -0400 (0:00:03.218)       0:00:03.283 ********** 
=============================================================================== 
Run hostname command -------------------------------------------------------------------- 3.22s
Playbook run took 0 days, 0 hours, 0 minutes, 3 seconds

现在,我添加ping:

---
- hosts: groupB
gather_facts: no
connection: ssh
tasks:
- name: ping all hosts
ping:
- name: Run hostname command
command: /bin/hostname
register: result
run_once: yes 
- name: Show result
debug:
var: result
run_once: yes 

并且jaxsat没有在其他任务上尝试:

$ ansible-playbook -i inventory/ test_one_only.yml
PLAY [groupB] **********************************************************************************
TASK [ping all hosts] **************************************************************************
Tuesday 21 July 2020  13:42:12 -0400 (0:00:00.057)       0:00:00.057 ********** 
ok: [rhel7.5]
fatal: [jaxsat]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 192.168.124.111 port 22: No route to hostrn", "unreachable": true}
[WARNING]: Failure using method (v2_runner_on_unreachable) in callback plugin
(<ansible.plugins.callback.mail.CallbackModule object at 0x7f07f21ad850>): [Errno 111]
Connection refused

TASK [Run hostname command] ********************************************************************
Tuesday 21 July 2020  13:42:15 -0400 (0:00:03.217)       0:00:03.275 ********** 
changed: [rhel7.5]
TASK [Show result] *****************************************************************************
Tuesday 21 July 2020  13:42:15 -0400 (0:00:00.348)       0:00:03.623 ********** 
ok: [rhel7.5] => {
"result": {
"changed": true, 
"cmd": [
"/bin/hostname"
], 
"delta": "0:00:00.001530", 
"end": "2020-07-21 13:42:15.921749", 
"failed": false, 
"rc": 0, 
"start": "2020-07-21 13:42:15.920219", 
"stderr": "", 
"stderr_lines": [], 
"stdout": "localhost.localdomain.localdomain", 
"stdout_lines": [
"localhost.localdomain.localdomain"
]
}
}
to retry, use: --limit @/home/jack/Ansible/TEST/test_one_only.retry
PLAY RECAP *************************************************************************************
jaxsat                     : ok=0    changed=0    unreachable=1    failed=0   
rhel7.5                    : ok=3    changed=1    unreachable=0    failed=0   
Tuesday 21 July 2020  13:42:15 -0400 (0:00:00.037)       0:00:03.661 ********** 
=============================================================================== 
ping all hosts -------------------------------------------------------------------------- 3.22s
Run hostname command -------------------------------------------------------------------- 0.35s
Show result ----------------------------------------------------------------------------- 0.04s
Playbook run took 0 days, 0 hours, 0 minutes, 3 seconds

要仅在组中的单个主机上执行任务,请尝试使用run_once: true执行可执行任务,详细参考信息可在此处找到

如果您知道主机是可访问的,则可以为组中的其余主机设置一个变量事实,如execute_on_secondary = False,并将其用于任务的block中的when条件

- set_fact:
execute_on_secondary: False
delegate_to: "{{ item }}"
with_items: "{{ play_hosts }}"
run_once: yes
when: ## condition for when primary is reachable ##
- block:
# tasks here
when: setup == "primary" or execute_on_secondary

下面的方式非常琐碎。在业务评论系统上使用之前,请对其进行充分测试。

- hosts: all
serial: 2
tasks:
- set_fact: ## same as above ##
run_once: yes
- block:
# tasks here
when: setup == "primary" or execute_on_secondary