多线程递归程序 c++

我正在研究一种递归算法，我们希望将其并行化以提高性能。

我使用 Visual c++ 12.0 和<线程>库实现了多线程。但是我没有看到任何性能改进。所花费的时间要么少于几毫秒，要么比单线程花费的时间多。

请让我知道我是否做错了什么，以及我应该对代码进行哪些更正。

这是我的代码

void nonRecursiveFoo(<className> &data, int first, int last)
{
    //process the data between first and last index and set its value to true based on some condition
    //no threads are created here
}

void recursiveFoo(<className> &data, int first, int last)
{
    int partitionIndex = -1;
    data[first]=true;
    data[last]=true;
    for (int i = first + 1; i < last; i++)
    {
        //some logic setting the index 
        If ( some condition is true)
            partitionIndex = i;
    }
//no dependency of partitions on one another and so can be parallelized
    if( partitionIndex != -1)
    {
        data[partitionIndex]=true;
        //assume some threadlimit
        if (Commons::GetCurrentThreadCount() < Commons::GetThreadLimit())
        {
            std::thread t1(recursiveFoo, std::ref(data), first, index);
            Commons::IncrementCurrentThreadCount();
            recursiveFoo(data, partitionIndex , last);
            t1.join();
        }
        else
        {
            nonRecursiveFoo(data, first, partitionIndex );
            nonRecursiveFoo(data, partitionIndex , last);
        }
    }
}

//主要

int main()
{
    recursiveFoo(data,0,data.size-1);
}

//下议院

std::mutex threadCountMutex;
static void Commons::IncrementCurrentThreadCount()
{
    threadCountMutex.lock();
        CurrentThreadCount++;
    threadCountMutex.unlock();
}
static int GetCurrentThreadCount()
{
    return CurrentThreadCount;
}
static void SetThreadLimit(int count)
{
    ThreadLimit = count;
}
static int GetThreadLimit()
{
    return ThreadLimit;
}
static int GetMinPointsPerThread()
{
    return MinimumPointsPerThread;
}

没有进一步的信息（见评论），这主要是猜测，但有几点你应该注意：

首先，确保您的分区逻辑与处理相比非常短且快速。否则，您创造的工作只会比获得的处理能力多。
确保有足够的工作开始，否则加速可能不足以支付线程创建的额外开销。
检查您的工作是否均匀分布在不同的线程中，并且不要生成比计算机上的内核更多的线程（最后打印总线程数 - 不要依赖您的ThreadLimit）。
不要让你的分区变得太小（尤其是不少于64字节），否则你最终会得到错误的共享。
CurrentThreadCount实现为std::atomic<int>会更有效，在这种情况下，您不需要互斥锁。
将计数器的增量放在创建线程之前。否则，新创建的线程可能会在递增之前读取计数器并再次生成一个新线程，即使已经达到最大线程数（这仍然不是一个完美的解决方案，但如果您已经验证，我只会为此投入更多时间，过度使用是您的实际问题）
如果您确实必须使用互斥锁（出于示例代码之外的原因），则必须将其用于每次访问CurrentThreadCount（读取和写入访问）。否则，严格来说，这是一种竞争条件，因此是 UB。

通过使用

t1.join，您基本上是在等待另一个线程完成 - 即不并行执行任何操作。

通过查看您的算法，我不明白如何使用线程并行化（从而改进）它 - 您必须等待单个递归调用结束。

首先，在创建的线程完成之前，您不会并行执行任何操作，因为每个线程创建都会阻塞。因此，您的多线程代码将始终比非多线程版本慢。

为了并行化，你可以为调用非递归函数的该部分生成线程，将线程 ID 放入向量中，并通过遍历向量来连接递归的最高级别。（虽然有更优雅的方法可以做到这一点，但我认为这是可以的）。

因此，所有非递归调用都将并行运行。但是你应该使用另一个条件，而不是最大线程数，而是问题的大小，例如 last-first<threshold .

相关内容

最新更新

热门标签：