我正在尝试使用sci-kit构建决策树。但是我的价值与所有值的预测相同。
le = preprocessing.LabelEncoder()
def labelEncoder(df, col_name):
df[[col_name]] = le.fit_transform(df[[col_name]])
labelEncoder(dfr, "Gender")
labelEncoder(dfr, "Subscription Tenure Type")
labelEncoder(dfr, "Located Region")
labelEncoder(dfr, "Attrition")
labelEncoder(dfr, "Type of subscription")
labelEncoder(dfr, "Genre")
# # Splitiing the data to test and train
feature = dfr[["Gender", "Age", "Subscription year", "Subscription Tenure Type", "Type of subscription",
"Located Region", "Average Hours of watching(Weekly)", "Attrition",
"Web channle utilization", "Mobile Channel Utilization"]]
labels = dfr[["Genre"]]
clf_gini = DecisionTreeClassifier(criterion="entropy", random_state=100,
max_depth=3, min_samples_leaf=9 ,min_samples_split=2, splitter='random')
clf_gini.fit(feature_train, labels_train)
y_pred = clf_gini.predict(feature_test)
print(list((y_pred)))
以下是示例数据。
User Id Genre Rating Gender Age Subscription year Subscription Tenure Type Type of subscription Located Region Average Hours of watching(Weekly) Attrition Web channle utilization Mobile Channel Utilization
1 Romance 4 Female 51 2000 Annual Individual R3 7 Yes 89 11
2 Action 4.769230769 Female 42 2004 6 Months Individual R6 13 No 88 12
2 Adventure 4.909090909 Female 42 2004 6 Months Individual R6 13 No 88 12
2 Comedy 4.2 Female 42 2004 6 Months Individual R6 13 No 88 12
2 Crime 5 Female 42 2004 6 Months Individual R6 13 No 88 12
2 Drama 4.2 Female 42 2004 6 Months Individual R6 13 No 88 12
您提供的代码片段存在一些问题。
- 您正在使用
svm
而不是clf_gini
;
预测 - 缺少将数据集实际将数据集分配到火车上的代码;
- 您是否将相同的转换应用于训练和测试集?
您是在调用svm
而不是clf_gini
。如果这没有回答您的问题,请您提供更多详细信息吗?
以下示例代码工作:
import pandas as pd
arr = [[1 , 'Romance', 4, 'Female', 51, 2000, 'Annual' , 'Individual' , 'R3', 7, 'Yes', 89, 11],
[2 , 'Action' , 4.7, 'Female', 42, 2004, '6 Months' , 'Individual', 'R6', 13, 'No', 88, 12],
[2 , 'Adventure', 4.9, 'Female', 42, 2004, '6 Months', 'Individual', 'R6', 13, 'No', 88, 12],
[2 , 'Comedy' , 4.2, 'Female', 42 , 2004, '6 Months' , 'Individual', 'R6' ,13, 'No', 88, 12],
[2 , 'Crime' , 5 , 'Female', 42 , 2004, '6 Months' , 'Individual', 'R6' , 13, 'No', 88, 12],
[2 , 'Drama' , 4.2, 'Female', 42, 2004, '6 Months' , 'Individual', 'R6', 13, 'No', 88, 12]]
headers = ['User Id', 'Genre', 'Rating', 'Gender', 'Age', 'Subscription year', 'Subscription Tenure Type', 'Type of subscription', 'Located Region', 'Average Hours of watching(Weekly)', 'Attrition', 'Web channle utilization', 'Mobile Channel Utilization']
dfr = pd.DataFrame(arr, columns = headers )
import sklearn
le = sklearn.preprocessing.LabelEncoder()
def labelEncoder(df, col_name):
df[[col_name]] = le.fit_transform(df[[col_name]])
labelEncoder(dfr, "Gender")
labelEncoder(dfr, "Subscription Tenure Type")
labelEncoder(dfr, "Located Region")
labelEncoder(dfr, "Attrition")
labelEncoder(dfr, "Type of subscription")
labelEncoder(dfr, "Genre")
# # Splitiing the data to test and train
feature = dfr[["Gender", "Age", "Subscription year", "Subscription Tenure Type", "Type of subscription",
"Located Region", "Average Hours of watching(Weekly)", "Attrition",
"Web channle utilization", "Mobile Channel Utilization"]]
clf_gini = DecisionTreeClassifier(criterion="entropy", random_state=100,
max_depth=3, min_samples_leaf=9 ,min_samples_split=2, splitter='random')
# create test / train split
dfr_train = dfr.iloc[:-1]
dfr_test = dfr.iloc[-1]
y_train = dfr_train['Genre']
y_test = dfr_test['Genre']
del dfr_train['Genre']
del dfr_test['Genre']
clf_gini.fit(dfr_train, y_train)
y_pred = clf_gini.predict(dfr_test)
print(list((y_pred)))