为什么我获得与所有人的预测相同的价值

我正在尝试使用sci-kit构建决策树。但是我的价值与所有值的预测相同。

le = preprocessing.LabelEncoder()
    def labelEncoder(df, col_name):
       df[[col_name]] = le.fit_transform(df[[col_name]])
    labelEncoder(dfr, "Gender")
    labelEncoder(dfr, "Subscription Tenure Type")
    labelEncoder(dfr, "Located Region")
    labelEncoder(dfr, "Attrition")
    labelEncoder(dfr, "Type of subscription")
    labelEncoder(dfr, "Genre")
    # # Splitiing the data to test and train
    feature = dfr[["Gender", "Age", "Subscription year", "Subscription Tenure Type", "Type of subscription",
          "Located Region", "Average Hours of watching(Weekly)", "Attrition",
          "Web channle utilization", "Mobile Channel Utilization"]]
    labels = dfr[["Genre"]]
clf_gini = DecisionTreeClassifier(criterion="entropy", random_state=100,
                                     max_depth=3, min_samples_leaf=9 ,min_samples_split=2, splitter='random')
clf_gini.fit(feature_train, labels_train)
y_pred = clf_gini.predict(feature_test)
print(list((y_pred)))

以下是示例数据。

User Id Genre   Rating  Gender  Age Subscription year   Subscription Tenure Type    Type of subscription    Located Region  Average Hours of watching(Weekly)   Attrition   Web channle utilization Mobile Channel Utilization
1   Romance 4   Female  51  2000    Annual  Individual  R3  7   Yes 89  11
2   Action  4.769230769 Female  42  2004    6 Months    Individual  R6  13  No  88  12
2   Adventure   4.909090909 Female  42  2004    6 Months    Individual  R6  13  No  88  12
2   Comedy  4.2 Female  42  2004    6 Months    Individual  R6  13  No  88  12
2   Crime   5   Female  42  2004    6 Months    Individual  R6  13  No  88  12
2   Drama   4.2 Female  42  2004    6 Months    Individual  R6  13  No  88  12

您提供的代码片段存在一些问题。

您正在使用svm而不是clf_gini;
缺少将数据集实际将数据集分配到火车上的代码；
您是否将相同的转换应用于训练和测试集？

您是在调用svm而不是clf_gini。如果这没有回答您的问题，请您提供更多详细信息吗？

以下示例代码工作：

import pandas as pd
arr = [[1  , 'Romance', 4,   'Female',  51,  2000,    'Annual' , 'Individual' , 'R3',  7,   'Yes', 89,  11],
[2  , 'Action' , 4.7, 'Female',  42,  2004,    '6 Months' ,   'Individual',  'R6',  13,  'No',  88,  12],
[2  , 'Adventure',   4.9, 'Female',  42,  2004,    '6 Months',    'Individual',  'R6',  13,  'No',  88,  12],
[2  , 'Comedy' , 4.2, 'Female',  42 , 2004,    '6 Months' ,   'Individual',  'R6'  ,13,  'No',  88,  12],
[2  , 'Crime'  , 5  , 'Female',  42 , 2004,    '6 Months' ,   'Individual',  'R6' , 13,  'No',  88,  12],
[2  , 'Drama'  , 4.2, 'Female',  42,  2004,    '6 Months' ,   'Individual',  'R6',  13,  'No',  88,  12]]
headers = ['User Id', 'Genre',   'Rating',  'Gender',  'Age', 'Subscription year',   'Subscription Tenure Type', 'Type of subscription',  'Located Region',  'Average Hours of watching(Weekly)',   'Attrition',   'Web channle utilization', 'Mobile Channel Utilization']
dfr = pd.DataFrame(arr, columns = headers )
import sklearn
le = sklearn.preprocessing.LabelEncoder()
def labelEncoder(df, col_name):
    df[[col_name]] = le.fit_transform(df[[col_name]])
labelEncoder(dfr, "Gender")
labelEncoder(dfr, "Subscription Tenure Type")
labelEncoder(dfr, "Located Region")
labelEncoder(dfr, "Attrition")
labelEncoder(dfr, "Type of subscription")
labelEncoder(dfr, "Genre")
# # Splitiing the data to test and train
feature = dfr[["Gender", "Age", "Subscription year", "Subscription Tenure Type", "Type of subscription",
  "Located Region", "Average Hours of watching(Weekly)", "Attrition",
  "Web channle utilization", "Mobile Channel Utilization"]]
clf_gini = DecisionTreeClassifier(criterion="entropy", random_state=100,
                                 max_depth=3, min_samples_leaf=9 ,min_samples_split=2, splitter='random')
# create test / train split
dfr_train = dfr.iloc[:-1]
dfr_test = dfr.iloc[-1]
y_train = dfr_train['Genre']
y_test = dfr_test['Genre']
del dfr_train['Genre']
del dfr_test['Genre']

clf_gini.fit(dfr_train, y_train)
y_pred = clf_gini.predict(dfr_test)
print(list((y_pred)))

相关内容

最新更新

热门标签：