Scikit-Learn:支持的目标类型包括:("二进制","多类")。取而代之的是'unknown'



我有一个产品(类别和生产商(的两个标称值及其价格,并试图确定在任何给定类别中,生产商是否通常有更高的价格。换句话说,我试图衡量一个品牌对价格的影响。我使用了下面的Python代码,无法运行并得到这个错误:

Supported target types are: ('binary', 'multiclass'). Got 'unknown' instead.

你能帮我解决这个问题吗?

# Load dataset
path = "Sales.xlsx"
names = ['Category', 'Producer', 'Average_base_price']
dataset = read_excel(path, dtype={'Average_base_price':float} ,names=names)
# creating instance of labelencoder
labelencoder = LabelEncoder()
array = dataset.values
# Split-out validation dataset
X, y = array[:, :-1], array[:, -1]
X[:, 0] = labelencoder.fit_transform(X[:, 0])
X[:, 1] = labelencoder.fit_transform(X[:, 1])
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20, random_state=1)
# Spot Check Algorithms
models = []
models.append(('LR', LogisticRegression(solver='liblinear', multi_class='ovr')))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC(gamma='auto')))
# evaluate each model in turn
results = []
names = []
for name, model in models:
kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
results.append(cv_results)
names.append(name)
print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))

你会得到这个错误,因为你的因变量是连续的,你试图做一个没有意义的分层kfold

如果像你说的那样:

换句话说,我试图衡量一个品牌对价格的影响。

那么你的因变量应该是价格。你的自变量将是名义值。你应该做一个热编码,而不是标签编码,因为这些不是预测因子,也不是标签。

使用示例数据集:

from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
import numpy as np
import pandas as pd
dataset = pd.DataFrame({'Category':np.random.choice(['A','B','C'],100),
'Producer':np.random.choice(['l','m','n'],100),
'Average_base_price':np.random.uniform(0,1,100)})

Onehot编码预测器:

enc = OneHotEncoder(handle_unknown='ignore')
X = enc.fit_transform(dataset[['Category','Producer']])
y = dataset[['Average_base_price']]
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20, random_state=1)

然后拟合模型,在这种情况下,您可以使用例如简单的线性回归:

model = LinearRegression()
cv_results = cross_val_score(model, X_train, Y_train, cv=10)

最新更新