我运行了一个需要及时的DB集群。遗憾的是,有时我的虚拟机主机正在将具有这样的DB节点的虚拟机移动到另一台主机,然后时间就不够了。然后,我的DB节点关闭,并由systemd重新启动。
我的systemd文件包含以下内容:
ExecStartPre=-+/usr/bin/chronyc -a makestep
ExecStart=/usr/local/bin/.......
我希望在这样的时间滞后关闭数据库后,这能立即同步我的时间。但由于我的日志,我花了7分钟的时间才发现并修复了差异。我的数据库在每次重新启动时都检测到间隙,然后再次关闭。最后,我得到了这个chronyd日志:
Nov 16 10:25:51 dc3-sirius chronyd[164166]: System clock was stepped by 0.000020 seconds
Nov 16 10:26:07 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:26:23 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:26:39 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:26:55 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:27:11 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:27:27 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:27:43 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:27:59 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:28:15 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:28:31 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:28:47 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:28:59 dc3-sirius chronyd[164166]: Source 81.169.199.94 replaced with 212.71.244.243
Nov 16 10:29:03 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:29:19 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:29:32 dc3-sirius chronyd[164166]: Selected source 109.230.227.90
Nov 16 10:29:35 dc3-sirius chronyd[164166]: System clock was stepped by 0.003850 seconds
Nov 16 10:29:51 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:30:07 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:30:23 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:30:39 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:30:55 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:31:11 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:31:27 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:31:43 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:31:59 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:32:13 dc3-sirius chronyd[164166]: Can't synchronise: no majority
Nov 16 10:32:15 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:32:31 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:32:33 dc3-sirius chronyd[164166]: Selected source 109.230.227.90
Nov 16 10:32:33 dc3-sirius chronyd[164166]: System clock wrong by 1.101260 seconds, adjustment started
Nov 16 10:32:48 dc3-sirius chronyd[164166]: System clock was stepped by 1.003151 seconds
Nov 16 10:33:04 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:33:21 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:33:37 dc3-sirius chronyd[164166]: System clock was stepped by -0.000000 seconds
Nov 16 10:33:51 dc3-sirius chronyd[164166]: Selected source 162.159.200.123
Nov 16 10:33:53 dc3-sirius chronyd[164166]: System clock was stepped by 0.409613 seconds
正如你所看到的,它在>7分钟:
我的数据库在10:25:51检测到问题。由此,在每次数据库重新启动之前,执行多次上述命令以重新同步时钟。但它需要直到10:32:33和10:33:53才能真正修复时钟。
你知道我如何直接让时钟同步,而不是几分钟后吗?
我终于找到了一个保持chrony的解决方案,并在出现时间滞后(由DB节点检测到(的情况下强制执行即时时间同步。解决方案是重新启动chronyd服务,模拟系统的重新启动。
我把数据库的systemd文件改成这样:
ExecStartPre=-+systemctl restart chronyd
ExecStartPre=/bin/sleep 5
ExecStart=/usr/local/bin/cockroach start ...
在/etc/chrony.conf
文件中,我添加了以下行:
initstepslew 0.5 pool.ntp.org
makestep 0.5 -1
如果时间偏移大于0.5秒,这将强制chronyd在重新启动时重新同步时间。
这最终使我的系统直接重新同步,然后立即重新启动数据库节点。
您可以在此处查找有关chrony.conf选项的更多信息。