sklearn和n_jobs >1中的超参数优化:酸洗



我在"泡菜"中。这是我的代码结构:

  • 充当抽象类的基类
  • 可以实例化的子类
    • 使用n_jobs=-1设置参数并调用RandomizedSearchCVGridSearchCV的方法。
      • 一个局部函数 create_model ,用于创建要由KerasClassifierKerasRegressor调用的神经网络模型(请参阅本教程(

我收到一个错误,说本地对象无法腌制。如果我改变n_jobs=1,那么没有问题。所以我怀疑问题出在本地函数和并行处理上。有解决这个问题的方法吗?谷歌搜索了一下后,似乎序列化程序dill可以在这里工作(我什至找到了一个名为 multiprocessing_on_dill 的包(。但我目前依靠sklearn的软件包。

我找到了解决问题的"解决方案"。我真的很困惑为什么这里的示例适用于n_jobs=-1,但我的代码却不能。似乎问题出在驻留在子类方法中的本地函数create_model。如果我使本地函数成为子类的方法,我可以设置n_jobs > 1 .

所以回顾一下,这是我的代码结构:

  • 充当抽象类的基类
  • 可以实例化的子类
    • 设置参数并使用 n_jobs=-1 调用RandomizedSearchCVGridSearchCV的方法。
    • 一种方法create_model,用于创建要由KerasClassifierKerasRegressor调用的神经网络模型

代码的一般思路:

from abc import ABCMeta
import numpy as np
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
class MLAlgorithms(metaclass=ABCMeta):
    def __init__(self, X_train, y_train, X_test, y_test=None):
        """
        Constructor with train and test data.
        :param X_train: Train descriptor data
        :param y_train: Train observed data
        :param X_test: Test descriptor data
        :param y_test: Test observed data
        """
        ...
    @abstractmethod
    def setmlalg(self, mlalg):
        """
        Sets a machine learning algorithm.
        :param mlalg: Dictionary of the machine learning algorithm.
        """
        pass
    @abstractmethod
    def fitmlalg(self, mlalg, rid=None):
        """
        Fits a machine learning algorithm.
        :param mlalg: Machine learning algorithm
        """
        pass

class MLClassification(MLAlgorithms):
    """
    Main class for classification machine learning algorithms.
    """
    def setmlalg(self, mlalg):
        """
        Sets a classification machine learning algorithm.
        :param mlalg: Dictionary of the classification machine learning algorithm.
        """
        ...
    def fitmlalg(self, mlalg):
        """
        Fits a classification machine learning algorithm.
        :param mlalg: Classification machine learning algorithm
        """
        ...
    # Function to create model, required for KerasClassifier
    def create_model(self, n_layers=1, units=10, input_dim=10, output_dim=1,
                     optimizer="rmsprop", loss="binary_crossentropy",
                     kernel_initializer="glorot_uniform", activation="sigmoid",
                     kernel_regularizer="l2", kernel_regularizer_weight=0.01,
                     lr=0.01, momentum=0.0, decay=0.0, nesterov=False, rho=0.9, epsilon=1E-8,
                     beta_1=0.9, beta_2=0.999, schedule_decay=0.004):
        from keras.models import Sequential
        from keras.layers import Dense
        from keras import regularizers, optimizers
        # Create model
        if kernel_regularizer.lower() == "l1":
            kernel_regularizer = regularizers.l1(l=kernel_regularizer_weight)
        elif kernel_regularizer.lower() == "l2":
            kernel_regularizer = regularizers.l2(l=kernel_regularizer_weight)
        elif kernel_regularizer.lower() == "l1_l2":
            kernel_regularizer = regularizers.l1_l2(l1=kernel_regularizer_weight, l2=kernel_regularizer_weight)
        else:
            print("Warning: Kernel regularizer {0} not supported. Using default 'l2' regularizer.".format(
                kernel_regularizer))
            kernel_regularizer = regularizers.l2(l=kernel_regularizer_weight)
        if optimizer.lower() == "sgd":
            optimizer = optimizers.sgd(lr=lr, momentum=momentum, decay=decay, nesterov=nesterov)
        elif optimizer.lower() == "rmsprop":
            optimizer = optimizers.rmsprop(lr=lr, rho=rho, epsilon=epsilon, decay=decay)
        elif optimizer.lower() == "adagrad":
            optimizer = optimizers.adagrad(lr=lr, epsilon=epsilon, decay=decay)
        elif optimizer.lower() == "adadelta":
            optimizer = optimizers.adadelta(lr=lr, rho=rho, epsilon=epsilon, decay=decay)
        elif optimizer.lower() == "adam":
            optimizer = optimizers.adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon, decay=decay)
        elif optimizer.lower() == "adamax":
            optimizer = optimizers.adamax(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon, decay=decay)
        elif optimizer.lower() == "nadam":
            optimizer = optimizers.nadam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon,
                                         schedule_decay=schedule_decay)
        else:
            print("Warning: Optimizer {0} not supported. Using default 'sgd' optimizer.".format(optimizer))
            optimizer = "sgd"
        model = Sequential()
        model.add(
            Dense(units=units, input_dim=input_dim,
                  kernel_initializer=kernel_initializer, activation=activation,
                  kernel_regularizer=kernel_regularizer))
        for layer_count in range(n_layers - 1):
            model.add(
                Dense(units=units, kernel_initializer=kernel_initializer, activation=activation,
                      kernel_regularizer=kernel_regularizer))
        model.add(Dense(units=output_dim,
                        kernel_initializer=kernel_initializer, activation=activation,
                        kernel_regularizer=kernel_regularizer))
        # Compile model
        model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy'])
        return model

class MLRegression(MLAlgorithms):
    """
    Main class for regression machine learning algorithms.
    """
    ...

我可以确认在 jupyter notebook/ipython 中的 Windows 上的 kerasClassifier 模型上运行具有并行化 (n_jobs>1( 的 kerasClassifier 模型时同样的问题(在 Unix 上没有问题(。

我通过将导致 pickle 问题的 create_model 函数放入模块中并导入模块而不是在环境中定义函数来解决此问题。

要为 Python 创建一个简单的模块,

  • 在运行主代码的同一文件夹中创建一个文本文件,并将其另存为my_module.py
  • 将create_model函数的定义放入文件中
  • 不要在代码中定义create_model,而是使用 import my_module 导入模块,并使用 my_module.create_model() 从模块调用函数

相关内容

  • 没有找到相关文章

最新更新