用一个热编码功能部署机器学习模型

我已经训练了一个XGBoost分类器，具有分类功能，我以前已经进行了一个热编码。例如，我有一个分类功能"年"，该功能在2014年至2018年之间的值。当OHED时，我获得了5个二进制功能：Year_2014，Year_2015，Year_2016，Year_2017，Year_2017，Year_2018。如果我在功能Year_2019不存在以来的样本= 2019的样本上进行预测会发生什么？

更一般而言，转换数据以对新样本进行预测的强大方法是什么？

二进制特征如下评估：

if(year != ${year value}){
  // Enter "left" branch
} else {
  // Enter "right" branch
}

一个看不见的类别级别被发送到"左"分支。

#While traning say year has below values
df = pd.DataFrame([2014,2015,2016,2017,2018], columns = ['year']) 
data=pd.get_dummies(df,columns=['year']) 
data.head()
# while predicting lets say input for year is 2018
known_categories = ['2014','2015','2016','2017','2018']    
year_type = pd.Series(['2018']) 
year_type = pd.Categorical(year_type, categories = known_categories)
pd.get_dummies(year_type)
# column name does not matter only the values matters which will be input to the model

相关内容

最新更新

热门标签：