如何在保存的模型中包含OneHotEncoder步骤以通过Flask部署?



我有一个XGBoost模型,它根据一些特征来预测学生是否会进入大学。模型的一部分是使用OneHotEncoder用字符串值转换一些列。这个模型没有任何问题,但我在构建一个接受JSON进行预测的基本Flask应用程序时遇到了一些问题。我的困惑是我在哪里添加OneHotEncoder步骤?我是否需要使用OneHotEncoding、模型参数和拟合的管道重新构建模型,再次保存模型,然后当我通过Postman发送JSON时,保存的模型将把数据通过OneHotEncoder步骤?我可以添加OneHotEncoder作为GridSearchCV步骤的一部分吗?

optimal_params = GridSearchCV(
estimator = xgb.XGBClassifier(objective='binary:logistic'),
param_grid=param_grid,
scoring='roc_auc',
verbose=2,
n_jobs=10,
cv=3
)
optimal_params.fit(X, 
y,
early_stopping_rounds=10,
eval_metric='auc',
eval_set=[(X_test, y_test)],
verbose=False)

瓶代码:

def return_prediction(college_model, sample_json):
type_school = sample_json["type_school"]
school_accreditation = sample_json["school_accreditation"]
gender = sample_json['gender']
interest = sample_json['interest']
residence = sample_json['residence']
parent_age = sample_json['parent_age']
parent_salary = sample_json['parent salary']
house_area = sample_json['house_area']
average_grades = sample_json['average_grades']
parent_was_in_college = sample_json['parent_was_in_college']

college = [[type_school, school_accreditation, gender, interest, 
residence, parent_age, parent_salary, house_area, 
average_grades, parent_was_in_collegel]]

class_ind = college_model.predict(college)
return class_ind

app = Flask(__name__)
@app.route("/")
def index():
return '<h1>Flask Running</h>'

college_model = joblib.load("college_model.pkl")
column_trans = joblib.load("ohe.pkl")

@app.route('/college', methods=['POST'])
def prediction():
content = request.json
results = return_predictions(college_model, column_trans, content)
results = results.tolist()
return jsonify(results)

if __name__=='__main__':
app.run()

在GridSearchCV之后,您将存储最终变量并将其模型放入pkl文件中。

你必须把你的flask代码的预处理部分。

数据应该尽可能是OneHotEncode,以便可以进行预测。

def return_prediction(college_model, sample_json):
type_school = sample_json["type_school"]
school_accreditation = sample_json["school_accreditation"]
gender = sample_json['gender']
interest = sample_json['interest']
residence = sample_json['residence']
parent_age = sample_json['parent_age']
parent_salary = sample_json['parent salary']
house_area = sample_json['house_area']
average_grades = sample_json['average_grades']
parent_was_in_college = sample_json['parent_was_in_college']

college = [[type_school, school_accreditation, gender, interest, 
residence, parent_age, parent_salary, house_area, 
average_grades, parent_was_in_collegel]]
colleage['in your str data'] = pd.get_dummies(colleage = columns = ['in your str data'])

class_ind = college_model.predict(college)
return class_ind