同时训练两个模型

我所需要做的就是，使用不同的核心，同时在相同的数据上训练两个回归模型(使用scikit-learn)。我试过自己用Process计算，但没有成功。

gb1 = GradientBoostingRegressor(n_estimators=10)
gb2 = GradientBoostingRegressor(n_estimators=100)
def train_model(model, data, target):
    model.fit(data, target)
live_data # Pandas DataFrame object
target # Numpy array object
p1 = Process(target=train_model, args=(gb1, live_data, target)) # same data
p2 = Process(target=train_model, args=(gb2, live_data, target)) # same data
p1.start()
p2.start()

如果我运行上面的代码，在尝试启动p1进程时会得到以下错误:

Traceback (most recent call last):
  File "<pyshell#28>", line 1, in <module>
    p1.start()
  File "C:Python27libmultiprocessingprocess.py", line 130, in start
    self._popen = Popen(self)
  File "C:Python27libmultiprocessingforking.py", line 274, in __init__
    to_child.close()
IOError: [Errno 22] Invalid argument

我在Windows上运行所有这些脚本(在IDLE中)。我该怎么做，有什么建议吗?

Ok…经过几个小时的努力，我将发布我的解决方案。第一件事。如果你在Windows上使用交互式解释器，你需要将所有代码封装在'main'条件下，函数定义和导入除外。这是因为当生成一个新进程时，它将进入循环。

我的解决方案如下:

from sklearn.ensemble import GradientBoostingRegressor
from multiprocessing import Pool
from itertools import repeat
def train_model(params):
    model, data, target = params
    # since Pool args accept once argument, we need to pass only one
    # and then unroll it as above
    model.fit(data, target)
    return model
if __name__ == '__main__':
    gb1 = GradientBoostingRegressor(n_estimators=10)
    gb2 = GradientBoostingRegressor(n_estimators=100)
    live_data # Pandas DataFrame object
    target    # Numpy array object
    po = Pool(2) # 2 is numbers of process we want to spawn
    gb, gb2 = po.map_async(train_model, 
                 zip([gb1,gb2], repeat(data), repeat(target))
                 # this will zip in one iterable object
              ).get()
    # get will start the processes and execute them
    po.terminate()
    # kill the spawned processes

相关内容

最新更新

热门标签：