为什么在pthread_detach()之后调用pthread_exit()时，在极少数情况下会导致SEGV

我在C++中得到了一个SEGV，当我的应用程序关闭时，我无法在对pthread_join()的调用中轻松复制它（大约在100000次测试运行中发生一次）。我检查了errno的值，它为零。这是在Centos v4上运行的。

pthread_join()在什么条件下会获得SEGV？这可能是某种种族状况，因为它极为罕见。有人建议我不应该调用pthread_detach（）和pthread_exit（），但我不清楚为什么。

我的第一个工作假设是，当pthread_exit()仍在另一个线程中运行时，正在调用pthread_join()，这会以某种方式导致SEGV，但许多人表示这不是问题。

在应用程序退出期间，在主线程中获取SEGV的失败代码大致如下（为简洁起见，省略了错误返回代码检查）：

// During application startup, this function is called to create the child thread:
return_val = pthread_create(&_threadId, &attr,
                            (void *(*)(void *))initialize,
                            (void *)this);
// Apparently this next line is the issue:
return_val = pthread_detach(_threadId);
// Later during exit the following code is executed in the main thread:
// This main thread waits for the child thread exit request to finish:
// Release condition so child thread will exit:
releaseCond(mtx(), startCond(), &startCount);
// Wait until the child thread is done exiting so we don't delete memory it is
// using while it is shutting down.
waitOnCond(mtx(), endCond(), &endCount, 0);
// The above wait completes at the point that the child thread is about
// to call pthread_exit().
// It is unspecified whether a thread that has exited but remains unjoined
// counts against {PTHREAD_THREADS_MAX}, hence we must do pthread_join() to
// avoid possibly leaking the threads we destroy.
pthread_join(_threadId, NULL); // SEGV in here!!!

退出时加入的子线程运行以下代码，该代码从主线程中调用releaseCond()的位置开始：

// Wait for main thread to tell us to exit:
waitOnCond(mtx(), startCond(), &startCount);
// Tell the main thread we are done so it will do pthread_join():
releaseCond(mtx(), endCond(), &endCount);
// At this point the main thread could call pthread_join() while we 
// call pthread_exit().
pthread_exit(NULL);

线程运行正常，在应用程序启动期间创建过程中没有产生错误代码，并且线程正确执行了任务，大约花了五秒钟后应用程序才退出。

是什么可能导致这种罕见的SEGV发生，以及我如何进行防御编程。一种说法是，我对pthread_detach（）的调用是问题所在，如果是，我的代码应该如何更正。

假设：

pthread_create返回零（您正在检查它，对吗？）
attr是一个有效的pthread_attr_t对象（你是如何创建它的？为什么不直接传递NULL呢？）
attr未指定要创建或分离的线程
您没有在其他线程上调用pthread_detach或pthread_join

那么pthread_join"不可能"失败，并且您的运行时可能存在其他内存损坏或错误。

[更新]

pthread_detach的RATIONALE部分说：

*pthread_join*（）或*pthread _detach*（）函数最终应该是为创建的每个线程调用，以便关联存储其中线程可以被回收。

虽然pthread_join文档中没有说明它们是互斥的，但它规定：

如果线程指定的值*pthread_join*（）的参数未引用可联接线程。

我很难找到说分离线程不可连接的确切措辞，但我很确定这是真的。

因此，调用pthread_join或pthread_detach，但不能同时调用两者。

如果您阅读了pthread_join和pthread_exit的标准文档以及相关页面，那么join将暂停执行"直到目标线程终止"，而调用pthread_exit的线程在调用pthread _exit之前不会终止，所以您担心的不可能是问题所在。

您可能在某个地方损坏了内存（正如Nemo所建议的），或者从清理处理程序调用了pthread_exit（正如user315052所建议的那样），或者其他什么。但它不是"pthread_join（）和pthread_exit（）之间的竞争条件"，除非您使用的是有缺陷或不兼容的实现。

没有足够的信息来完全诊断您的问题。我同意其他发布的答案，即问题更可能是代码中的未定义行为，而不是pthread_join和pthread_exit之间的竞争条件。但我也同意这样一种竞赛的存在将构成pthread库实现中的一个错误。

关于pthread_join:

return_val = pthread_create(&_threadId, &attr,
                            (void *(*)(void *))initialize,
                            (void *)this);
//...
pthread_join(_threadId, NULL); // SEGV in here!!!

看起来连接在一个类中。这样就有可能在main尝试执行联接时删除对象。如果pthread_join正在访问释放的内存，则结果是未定义的行为。我倾向于这种可能性，因为访问释放的内存通常是不被发现的。

关于pthread_exit：Linux上的手册页和POSIX规范状态：

当第一次调用main（）的线程以外的线程从用于创建它的启动例程返回时，会对pthread_exit（）进行隐式调用。函数的返回值应作为线程的退出状态。
如果从取消清理处理程序或析构函数调用pthread_exit（），则pthread_exit（。

如果pthread_exit调用是在清理处理程序中进行的，则会有未定义的行为。

相关内容

最新更新

热门标签：