如何在时间序列数据上网格化搜索Light GBM



我想对时间序列数据进行网格搜索。有什么功能可以这样做来搜索我在";lgb_ params";例如

lgb_params = {
"learning_rate": [0.001, 0.01, 0.1, 0.2],
"max_depth": [3, 5, 7, 9],
"num_leaves": [5, 10, 15],
"num_boost_round": 10000,
"early_stopping_rounds": 300,
"feature_fraction": [0.2, 0.3, 0.5, 0.7, 0.8],
"verbose": 0
}
lgbtrain = lgb.Dataset(data=X_train, label=y_train, feature_name=cols)
lgbval = lgb.Dataset(data=X_val, label=y_val, reference=lgbtrain, feature_name=cols)

model = lgb.train(lgb_params, lgbtrain,
valid_sets=[lgbtrain, lgbval],
num_boost_round=lgb_params['num_boost_round'],
early_stopping_rounds=lgb_params['early_stopping_rounds'],
feval=lgbm_smape,
verbose_eval=100)

当然,上面的代码最终不起作用,因为lgb-params包含值超过1的键(例如,learning_rate、max_depth等(。好吧,这些都是我真正想搜索的,这就是问题所在…

我想我想出了一个解决方案,它目前正在运行,还没有完成,因为它搜索了很多值,但下面是我写的函数,以备有人需要:

def param_search(lgb_param_dict):
min_error = float("inf")
best_params = dict()
best_iter = float("inf")
for i in range(len(lgb_param_dict["learning_rate"])):
lgb_params = dict()
lgb_params["learning_rate"] = lgb_param_dict["learning_rate"][i]
for j in range(len(lgb_param_dict["max_depth"])):
lgb_params["max_depth"] = lgb_param_dict["max_depth"][j]
for k in range(len(lgb_param_dict["num_leaves"])):
lgb_params["num_leaves"] = lgb_param_dict["num_leaves"][k]
for s in range(len(lgb_param_dict["feature_fraction"])):
lgb_params["feature_fraction"] = lgb_param_dict["feature_fraction"][s]
print(" ")
print("##########")
print("Learning_rate = " + str(lgb_params["learning_rate"]))
print("max_depth = " + str(lgb_params["max_depth"]))
print("num_leaves = " + str(lgb_params["num_leaves"]))
print("feature_fraction = " + str(lgb_params["feature_fraction"]))
model = lgb.train(lgb_params, lgbtrain,
valid_sets=[lgbtrain, lgbval],
num_boost_round=lgb_full_params["num_boost_round"],
early_stopping_rounds=lgb_full_params["early_stopping_rounds"],
feval=lgbm_smape,
verbose_eval=500)
print("Learning_rate = " + str(lgb_params["learning_rate"]))
print("max_depth = " + str(lgb_params["max_depth"]))
print("num_leaves = " + str(lgb_params["num_leaves"]))
print("feature_fraction = " + str(lgb_params["feature_fraction"]))
if min_error > dict(model.best_score["valid_1"])["SMAPE"]:
min_error = dict(model.best_score["valid_1"])["SMAPE"]
best_params = model.params
best_iter = model.best_iteration
else:
continue
return min_error, best_params, best_iter

打印报表是为了可读性。可能有更好的方法来编写这个函数,但如果它完成得没有任何问题,我会批准它作为答案。

编辑:成功了!

最新更新