在c中使用心跳信号进行线程监控



我想监视线程。我使用了send &接收心跳&确认信号。
scnMonitor_t是一个监视器结构。当新线程被添加时,它注册到监视器&添加到scnThreadlist_t。monitorHeartbeatCheck是用program开始的线程,monitorHeartbeatProcess是添加到所有线程函数中的API。

实际上我的问题是进程的索引没有正确遵循它以等待HB条件结束第三线程&已创建死锁。应该是什么问题呢?
提前感谢。

typedef struct scnThreadList_{
        osiThread_t     thread;
        struct scnThreadList_   *next;
} scnThreadList_t;
typedef struct scnMonitor_{
        bool            started;
        osiThread_t     heartbeatThread; 
        osiMutex_t      heartbeatMutex;
        osiMutex_t      ackMutex;
        osiCond_t       heartbeatCond;
        scnThreadList_t *threads;
} scnMonitor_t; 
static scnMonitor_t *s_monitor = NULL;
// Main heartbeat check thread
void* monitorHeartbeatCheck( void *handle )
{
        scnThreadList_t *pObj = NULL;
        static int idx = 0;
        static bool waitAck = false;
        while ( 1 ) { 
                pObj = s_monitor->threads;
        while ( pObj && ( pObj != s_monitor->heartbeatThread ) ) { //skip it-self from monitoring.
                ++idx;
                printf(""HB Check No.%d"n",idx);
                // send heartbeat
                usleep( 250 * 1000 );
                pthread_mutex_lock( s_monitor->heartbeatMutex, 1 );
                pthread_cond_signal( s_monitor->heartbeatCond );    
                printf("-->C %d HB sentn",idx);
                pthread_mutex_unlock( s_monitor->heartbeatMutex );
                // wait for ACK
                while( !waitAck ){
                        pthread_mutex_lock( s_monitor->ackMutex, 1 );
                        printf("|| C %d wait Ackn",idx);
                        waitAck = true;
                        pthread_cond_wait( s_monitor->heartbeatCond, s_monitor->ackMutex );
                        waitAck = false;
                        printf("<--C %d received Ackn",idx);
                        pthread_mutex_unlock( s_monitor->ackMutex );
                        LOG_INFO( SCN_MONITOR, "ACK from thread %p n", pObj->thread );
                }
                        pObj = pObj->next;
                }
        } // while, infinite
        return NULL;
}
// Waits for hearbeat and acknowledges
// Call this API from every thread function that are registered
int monitorHeartbeatProcess( void )
{
        static int id = 0;
        static bool waitHb = false;
        ++ id;
        printf(""HB Process No.%d"n",id);
        // wait for HB
        while(!waitHb){
                pthread_mutex_lock( s_monitor->heartbeatMutex, 1 );
                printf("|| P %d wait for HBn",id);
                waitHb = true;
                pthread_cond_wait( s_monitor->heartbeatCond, s_monitor->heartbeatMutex );
                waitHb = false;
                printf("<--P %d HB received n",id);
                pthread_mutex_unlock( s_monitor->heartbeatMutex );
        }
        // send ACK
        uleep( 250 * 1000 );
        pthread_mutex_lock( s_monitor->ackMutex, 1 );
        pthread_cond_signal( s_monitor->heartbeatCond );
        printf("-->P %d ACK sentn",id);
        pthread_mutex_unlock( s_monitor->ackMutex );
        return 1;
}

每次只能将一个互斥锁与一个条件关联。同时使用具有相同条件的两个不同的互斥体可能会导致应用程序中不可预测的序列化问题。

http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=%2Fapis%2Fusers_78.htm

您有两个不同的互斥体与您的条件heartbeatCond。

我想你这里遇到了死锁。调用monitorHeartbeatProcess()的线程在heartbeatMutex上获取互斥锁,并在条件变量heartbeatCond上等待信号。调用monitorHeartbeatCheck()的线程在ackMutex上获取互斥锁,并在条件变量heartbeatCond上等待信号。因此,两个线程都等待条件变量heartbeatCond,从而导致死锁。如果你如此特别地使用两个互斥体,为什么不使用两个条件变量呢?

信令时,不要使用互斥锁。只在等待期间使用互斥锁。意味着

pthread_mutex_lock( s_monitor->ackMutex, 1 );  ----> remove this line
pthread_cond_signal( s_monitor->heartbeatCond );
pthread_mutex_unlock( s_monitor->ackMutex );    ----> remote this line.

最新更新