我有一个服务450多个代理的Asterisk服务器,它已经投入生产一年多了。每隔一段时间,我们就会经历3秒的音频丢失,然后是1秒的音频混乱。流量数据和Wireshark回放显示它起源于这个系统。今天发现的症状是,在这些短时间的音频问题中,日志记录是无序的,即时间戳中的秒前后跳跃。在RTP捕获中,1秒的乱码音频由许多"错误时间戳"标记组成。
在下面的日志中,你会看到它在12:20:45和12:20:48之间跳跃。
日志snipit:
[2014-08-20 12:20:44] DEBUG[14835] devicestate.c: Changing state for SIP/SIP/peertrunk4-0000211d - state 4 (Invalid)
[2014-08-20 12:20:44] DEBUG[14854] app_queue.c: Device 'SIP/SIP/peertrunk4-0000211d' changed to state '4' (Invalid) but we don't care because they'r$
[2014-08-20 12:20:44] DEBUG[7558] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:44] DEBUG[26007] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:45] DEBUG[27190] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:45] DEBUG[14853] chan_sip.c: = No match Their Call ID: 178f1fb01ee44c39299de3af40b487c9@10.200.5.7 Their Tag Our tag: as28e70c40
[2014-08-20 12:20:45] DEBUG[26897] app_dial.c: Dunno what to do with control type 25
[2014-08-20 12:20:45] DEBUG[7558] res_musiconhold.c: SIP/peertrunk3-000013d1 Opened file 0 '/home/asterisk/env/agentwaiting/agent-waiting-long'
[2014-08-20 12:20:48] DEBUG[7558] rtp.c: Difference is 31288, ms is 3931
[2014-08-20 12:20:45] DEBUG[27244] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:45] DEBUG[24681] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:45] DEBUG[24578] rtp.c: Got RTCP report of 44 bytes
[2014-08-20 12:20:48] WARNING[26897] channel.c: Exceptionally long voice queue length queuing to Local/NPANXXXXXX@celeus-3920,1
[2014-08-20 12:20:45] DEBUG[23512] channel.c: Scheduling timer at 0 sample intervals
[2014-08-20 12:20:48] DEBUG[23512] channel.c: Stopped silence generator on 'SIP/peertrunk3-00001029'
[2014-08-20 12:20:48] DEBUG[23512] channel.c: Set channel SIP/peertrunk3-00001029 to write format gsm
[2014-08-20 12:20:45] DEBUG[15150] manager.c: Manager received command 'StopMonitor'
[2014-08-20 12:20:45] DEBUG[27025] channel.c: Scheduling timer at 138 sample intervals
[2014-08-20 12:20:48] DEBUG[15150] channel.c: Avoiding initial deadlock for channel '0x7f31c8787b10'
[2014-08-20 12:20:45] DEBUG[26717] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:45] DEBUG[27356] channel.c: Scheduling timer at 0 sample intervals
[2014-08-20 12:20:48] DEBUG[27356] channel.c: Scheduling timer at 0 sample intervals
[2014-08-20 12:20:48] DEBUG[27356] channel.c: Scheduling timer at 0 sample intervals
[2014-08-20 12:20:48] DEBUG[15150] channel.c: Avoiding initial deadlock for channel '0x7f31c8787b10'
[2014-08-20 12:20:48] DEBUG[27356] chan_agent.c: Waited for stream, result '0'
[2014-08-20 12:20:48] DEBUG[27356] chan_agent.c: Set read format, result '0'
[2014-08-20 12:20:48] DEBUG[27356] chan_agent.c: Set write format, result '0'
[2014-08-20 12:20:48] DEBUG[27356] devicestate.c: Notification of state change to be queued on device/channel Agent/436
[2014-08-20 12:20:45] DEBUG[26724] rtp.c: Got RTCP report of 44 bytes
[2014-08-20 12:20:48] VERBOSE[27356] logger.c: [2014-08-20 12:20:48] -- Called Agent/436
[2014-08-20 12:20:48] DEBUG[27356] devicestate.c: No provider found, checking channel drivers for Agent - 436
[2014-08-20 12:20:48] DEBUG[15150] channel.c: Avoiding initial deadlock for channel '0x7f31c8787b10'
[2014-08-20 12:20:48] DEBUG[23512] channel.c: Set channel SIP/peertrunk3-00001029 to write format slin
[2014-08-20 12:20:48] DEBUG[23512] channel.c: Scheduling timer at 160 sample intervals
[2014-08-20 12:20:48] DEBUG[23512] channel.c: Started silence generator on 'SIP/peertrunk3-00001029'
[2014-08-20 12:20:48] DEBUG[23512] rtp.c: Difference is 30512, ms is 3834
[2014-08-20 12:20:48] DEBUG[23512] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:45] DEBUG[18409] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:45] DEBUG[12607] res_musiconhold.c: SIP/peertrunk3-00002937 Opened file 0 '/home/asterisk/env/agentwaiting/agent-waiting-long'
[2014-08-20 12:20:45] DEBUG[26469] rtp.c: Got RTCP report of 44 bytes
[2014-08-20 12:20:45] DEBUG[26877] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:45] DEBUG[26002] rtp.c: Got RTCP report of 64 bytes
[2014-08-20 12:20:48] VERBOSE[27356] logger.c: [2014-08-20 12:20:48] -- Agent/436 answered Local/NPANXXXXXX@celeus-bccf,1
环境:
- VMware HA集群,服务约50 vm
- ESXi 5.1
- 没有过度承诺的资源&此虚拟机的CPU/磁盘IO分配设置为"高"
- CentOS 6.5
- Asterisk C.3.7.2 (Business Edition, 1.4 trunk)
- vmware-tools是当前的
- 主机时间同步,而不是NTPd
我在vCenter提供给我的历史或实时统计中没有看到任何有趣的东西,尽管我完全承认我可能在所有错误的地方寻找。另外,这3秒的异常可能在图中平均。
我假设导致日志混乱的原因也会导致音频丢失。在我的搜索中,我无法找到任何无序日志条目的引用。以前有人遇到过这个问题吗?
看起来是时间重新同步的问题。星号写的时候没有考虑到时间变化的可能性,因为它是为通用硬件设计的。
可以建议如下
1)去掉vmware,换上硬件主机
或
2)禁用主机时间同步,设置为ntp。