Linux (armv7l) 上的多处理池中的 scikit-learn train_test_split不起作用

在Rasbperry Pi 3上运行Python时，我在多处理池中使用train_test_split时遇到了一些奇怪的行为。

我有这样的东西：

def evaluate_Classifier(model,Features,Labels,split_ratio):
X_train, X_val, y_train, y_val = train_test_split(Features,Labels,test_size=split_ratio)
...

iterations=500
pool = multiprocessing.Pool(4)
results = [pool.apply_async(evaluate_Classifier, args=(w,Current_Features,Current_Labels,0.35)) for i in range(iterations)]
output = [p.get() for p in results]
pool.close()
pool.join()

现在上面的代码在Windows 7 Python 3.5.6上运行良好，事实上，4个线程中的每一个都将有一个不同的训练/测试拆分。

但是，当我在Raspberry Pi 3(scikit-learn 0.19.2(上运行它时，似乎4个线程以完全相同的方式拆分数据，因此所有线程都产生完全相同的结果。接下来的 4 个线程将再次拆分数据(这次不同(，但它们之间的方式仍然完全相同，依此类推......

我甚至尝试将train_test_split与 random_state=np.random.randint 一起使用，但它没有帮助。

任何想法为什么这在 Windows 中有效，但在树莓派 3 上它似乎没有正确并行化？

非常感谢

与其设置随机状态，不如尝试在拆分之前洗牌数据。您可以通过设置参数来执行此操作：shuffle=True。

shuffle 默认处于打开状态，因此即使使用 shuffle=True，它也不会有什么区别。如果可能的话，我还想在并行化函数中拆分数据。

实际上，一些挖掘我发现这是因为Windows和Linux如何处理子进程等的多个线程和资源等。上述问题的最佳解决方案如下：

def evaluate_Classifier(model,Features,Labels,split_ratio,i):
X_train, X_val, y_train, y_val =   train_test_split(Features,Labels,test_size=split_ratio,random_state=i)
...

iterations=500
pool = multiprocessing.Pool(4)
results = [pool.apply_async(evaluate_Classifier,   args=(w,Current_Features,Current_Labels,0.35, i)) for i in range(iterations)]
output = [p.get() for p in results]
pool.close()
pool.join()

这将很好地工作，并且为了在不同代码运行之间获得更多的随机性，我们可以在函数之外使用一些随机数生成器而不是i

相关内容

最新更新

热门标签：