类型错误：get_params() 缺少 1 个必需的位置参数："self"

我试图将scikit-learn包与python-3.4一起使用来进行网格搜索，

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.metrics import precision_score, recall_score, accuracy_score
from sklearn.preprocessing import LabelBinarizer
import numpy as np
pipeline = Pipeline([
('vect', TfidfVectorizer(stop_words='english')),
('clf', LogisticRegression)
])
parameters = {
'vect__max_df': (0.25, 0.5, 0.75),
'vect__stop_words': ('english', None),
'vect__max_features': (2500, 5000, 10000, None),
'vect__ngram_range': ((1, 1), (1, 2)),
'vect__use_idf': (True, False),
'vect__norm': ('l1', 'l2'),
'clf__penalty': ('l1', 'l2'),
'clf__C': (0.01, 0.1, 1, 10)
}
if __name__ == '__main__':
grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, scoring='accuracy', cv = 3)
df = pd.read_csv('SMS Spam Collection/SMSSpamCollection', delimiter='t', header=None)
lb = LabelBinarizer()
X, y = df[1], np.array([number[0] for number in lb.fit_transform(df[0])])
X_train, X_test, y_train, y_test = train_test_split(X, y)
grid_search.fit(X_train, y_train)
print('Best score: ', grid_search.best_score_)
print('Best parameter set:')
best_parameters = grid_search.best_estimator_.get_params()
for param_name in sorted(best_parameters):
print(param_name, best_parameters[param_name])

但是，它没有成功运行，错误消息如下所示：

Fitting 3 folds for each of 1536 candidates, totalling 4608 fits
Traceback (most recent call last):
File "/home/xiangru/PycharmProjects/machine_learning_note_with_sklearn/grid search.py", line 36, in <module>
grid_search.fit(X_train, y_train)
File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 732, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "/usr/local/lib/python3.4/dist-packages/sklearn/grid_search.py", line 493, in _fit
base_estimator = clone(self.estimator)
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 47, in clone
new_object_params[name] = clone(param, safe=False)
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
return estimator_type([clone(e, safe=safe) for e in estimator])
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
return estimator_type([clone(e, safe=safe) for e in estimator])
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in clone
return estimator_type([clone(e, safe=safe) for e in estimator])
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 35, in <listcomp>
return estimator_type([clone(e, safe=safe) for e in estimator])
File "/usr/local/lib/python3.4/dist-packages/sklearn/base.py", line 45, in clone
new_object_params = estimator.get_params(deep=False)
TypeError: get_params() missing 1 required positional argument: 'self'

我也尝试只使用

if __name__ == '__main__':
pipeline.get_params()

它给出相同的错误消息。谁知道如何解决这个问题？

此错误几乎总是具有误导性，实际上意味着您在类而不是实例上调用实例方法(例如在名为d的dict上调用dict.keys()而不是d.keys())。^*

这正是这里正在发生的事情。文档暗示best_estimator_属性(如初始值设定项的estimator参数)不是估计器实例，而是估计器类型，并且"为每个网格点实例化该类型的对象"。

因此，如果要调用方法，则必须为某个特定的网格点构造该类型的对象。

但是，通过快速浏览文档，如果您尝试获取用于返回最佳分数的最佳估算器的特定实例的参数，那不是会best_params_吗？(很抱歉，这部分有点猜测...

对于Pipeline调用，您肯定有一个实例。该方法的唯一文档是一个参数规范，它显示它需要一个可选参数，deep。但在幕后，它可能会将get_params()调用转发到其属性之一。对于('clf', LogisticRegression)，看起来您正在使用类LogisticRegression而不是该类的实例来构造它，因此，如果这是它最终转发的内容，则可以解释问题。

_{* 错误说"缺少 1 个必需的位置参数：'self'"而不是"必须在实例上调用"或其他东西的原因是，在 Python 中，d.keys()有效地变成了dict.keys(d)，并且以这种方式明确地调用它是完全合法的(有时是有用的)，所以 Python 不能真正告诉你dict.keys()是非法的，只是它缺少self论点。}

我终于解决了问题。原因正如阿巴纳特所说。

首先，我尝试了：

pipeline = LogisticRegression()
parameters = {
'penalty': ('l1', 'l2'),
'C': (0.01, 0.1, 1, 10)
}

而且效果很好。

凭着这种直觉，我将管道修改为：

pipeline = Pipeline([
('vect', TfidfVectorizer(stop_words='english')),
('clf', LogisticRegression())
])

请注意，LogisticRegression之后有一个()。这次它奏效了。

相关内容

最新更新

热门标签：