Ansible async_status任务 - 错误:ansible_job_id "undefined variable"



我有一个3节点的ubuntu 20.04 lts-kvm-kubernetes集群,kvm主机也是ubuntu 20.004 lts。我在kvm主机上运行了剧本。我有以下库存摘录:

nodes:
hosts:
sea_r:
ansible_host: 192.168.122.60
spring_r:
ansible_host: 192.168.122.92
island_r:
ansible_host: 192.168.122.93
vars:
ansible_user: root

一直在尝试使用async_status,但总是失败

- name: root commands
hosts: nodes
tasks:
- name: bash commands
ansible.builtin.shell: |
apt update
args:
chdir: /root
executable: /bin/bash
async: 2000
poll: 2
register: output
- name: check progress
ansible.builtin.async_status:
jid: "{{ output.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 200
delay: 5

错误:

fatal: [sea_r]: FAILED! => {"msg": "The task
includes an option with an undefined variable. 
The error was: 'dict object' has no attribute
'ansible_job_id' ...

如果我尝试以下内容,

- name: root commands
hosts: nodes
tasks:
- name: bash commands
ansible.builtin.shell: |
apt update
args:
chdir: /root
executable: /bin/bash
async: 2000
poll: 2
register: output
- debug: msg="{{ output.stdout_lines }}"
- debug: msg="{{ output.stderr_lines }}"

我没有任何错误。还尝试了以下变体,

- name: check progress
ansible.builtin.async_status:
jid: "{{ item.ansible_job_id }}"
with_items: "{{ output }}"
register: job_result
until: job_result.finished
retries: 200
delay: 5

这被建议作为类似错误的解决方案。这也没有帮助,我只是得到了略有不同的错误:

fatal: [sea_r]: FAILED! => {"msg": "The task includes
an option with an undefined variable. The error 
was: 'ansible.utils.unsafe_proxy.AnsibleUnsafeText
object' has no attribute 'ansible_job_id' ...

在剧本的开头和结尾,我恢复并暂停我的3 kvm服务器节点,如下所示:

- name: resume vms
hosts: local_vm_ctl
tasks:
- name: resume vm servers
shell: |
virsh resume kub3
virsh resume kub2
virsh resume kub1
virsh list --state-paused --state-running
args:
chdir: /home/bi
executable: /bin/bash
environment:
LIBVIRT_DEFAULT_URI: qemu:///system
register: output
- debug: msg="{{ output.stdout_lines }}"
- debug: msg="{{ output.stderr_lines }}"

因此

- name: pause vms
hosts: local_vm_ctl
tasks:
- name: suspend vm servers
shell: |
virsh suspend kub3
virsh suspend kub2
virsh suspend kub1
virsh list --state-paused --state-running
args:
chdir: /home/bi
executable: /bin/bash
environment:
LIBVIRT_DEFAULT_URI: qemu:///system
register: output
- debug: msg="{{ output.stdout_lines }}"
- debug: msg="{{ output.stderr_lines }}"

但我看不出这些剧本与上述错误有什么关系。

任何帮助都将不胜感激。

您的作业id出现未定义的错误,因为:

  1. 您在初始任务中使用poll: X,因此ansible每隔X秒连接一次以检查任务是否完成
  2. 当ansible存在该任务并进入下一个async_status任务时,作业就完成了。由于您对poll使用了非零值,异步状态缓存将自动清除
  3. 由于缓存已被清除,作业id已不存在

您的上述场景旨在避免在长时间运行的任务中与目标超时,而不是同时运行任务,并在稍后对其状态进行检查。对于第二个需求,您需要使用poll: 0运行异步任务,并自行清理缓存

有关上述概念的更多解释,请参阅文档:

  • 易解析异步指南
  • ansibleasync_status模块

我为您的上述任务做了一个例子,并将其修复为使用专用模块apt(请注意,您可以向带有一个或一个包列表的模块添加name选项,ansible将在一个步骤中完成缓存更新和安装(。此外,如果您想确保不会错过结尾,那么async_status任务上的retries * delay应该等于或大于初始任务上的async

- name: Update apt cache
ansible.builtin.apt:
update_cache: true
async: 2000
poll: 0
register: output
- name: check progress
ansible.builtin.async_status:
jid: "{{ output.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 400
delay: 5
- name: clean async job cache 
ansible.builtin.async_status:
jid: "{{ output.ansible_job_id }}"
mode: cleanup

这对于并行启动一系列长期任务更有用。这里有一个无用但功能强大的例子:

- name: launch some loooooong tasks
shell: "{{ item }}"
loop:
- sleep 30
- sleep 20
- sleep 35
async: 100
poll: 0
register: long_cmd
- name: wait until all commands are done
async_status:
jid: "{{ item.ansible_job_id }}"
register: async_poll_result
until: async_poll_result.finished
retries: 50
delay: 2
loop: "{{ long_cmd.results }}"
- name: clean async job cache
async_status:
jid: "{{ item.ansible_job_id }}"
mode: cleanup
loop: "{{ long_cmd.results }}"

您的任务上有poll: 2,它告诉Ansible每2秒对异步作业进行一次内部轮询,并返回注册变量中的最终状态。为了使用async_status,您应该设置poll: 0,这样任务就不会等待作业完成。

最新更新