r-是什么导致了在tidymodels规范中损失函数为nan



我正在处理这个数据集的一个问题。我正试图建立一个模型,从其他所有预测因素中预测日本销售额(除了排名、名称和全球销售额,这与结果变量过于相关(。所以,我做到了:

vgames <- read_csv('data/vgsales.csv', show_col_types = FALSE, col_types = list(
Year = col_date("%Y")
)) %>%
mutate(
Platform = factor(Platform),
Genre = factor(Genre),
Publisher = factor(Publisher)
)
vgames_model <- vgames %>%
select(-c(Rank, Name, Global_Sales))
# Train test split
vgames_split <- vgames_model %>% initial_split()
vgames_training <- vgames_split %>% training()
vgames_testing <- vgames_split %>% testing()
# Folds for CV
vgames_folds <- vgames_training %>% vfold_cv(v = 10)
# Recipe
vgames_recipe <- vgames_training %>%
recipe(formula = JP_Sales ~ .) %>%
step_normalize(all_numeric_predictors()) %>%
step_date(Year, features = c("year"), keep_original_cols = FALSE) %>%
step_dummy(all_nominal()) %>%
step_zv(all_numeric_predictors())

这个配方的输出是这样的:

# A tibble: 12,448 × 570
NA_Sales EU_Sales Other_…¹ JP_Sa…² Year_…³ Platf…⁴ Platf…⁵ Platf…⁶ Platf…⁷ Platf…⁸ Platf…⁹ Platf…˟ Platf…˟ Platf…˟
<dbl>    <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1   -0.272  -0.279   -0.240     0       2006       0       0       0       0       0       0       0       0       0
2    0.145   0.258    0.0629    0       2012       0       0       0       0       0       0       0       0       0
3   -0.198  -0.241   -0.189     0.07    2008       0       0       0       1       0       0       0       0       0
4   -0.149  -0.260   -0.189     0       2010       0       0       0       1       0       0       0       0       0
5   -0.149  -0.0679  -0.0380    0       2006       0       0       0       0       0       0       0       0       0
6   -0.296  -0.183   -0.189     0       2015       0       1       0       0       0       0       0       0       0
7    3.32    1.05     0.315     1.81    1988       0       0       0       0       0       0       0       0       0
8   -0.308  -0.260   -0.240     0       2016       0       0       0       0       0       0       0       0       0
9   -0.321  -0.202   -0.240     0       2015       0       0       0       0       0       0       0       0       0
10   -0.112  -0.145   -0.139     0       2010       0       0       0       0       0       0       0       0       0
# … with 12,438 more rows, 556 more variables: Platform_N64 <dbl>, Platform_NES <dbl>, Platform_NG <dbl>,
#   Platform_PC <dbl>, Platform_PCFX <dbl>, Platform_PS <dbl>, Platform_PS2 <dbl>, Platform_PS3 <dbl>,
#   Platform_PS4 <dbl>, Platform_PSP <dbl>, Platform_PSV <dbl>, Platform_SAT <dbl>, Platform_SCD <dbl>,
#   Platform_SNES <dbl>, Platform_TG16 <dbl>, Platform_Wii <dbl>, Platform_WiiU <dbl>, Platform_WS <dbl>,
#   Platform_X360 <dbl>, Platform_XB <dbl>, Platform_XOne <dbl>, Genre_Adventure <dbl>, Genre_Fighting <dbl>,
#   Genre_Misc <dbl>, Genre_Platform <dbl>, Genre_Puzzle <dbl>, Genre_Racing <dbl>, Genre_Role.Playing <dbl>,
#   Genre_Shooter <dbl>, Genre_Simulation <dbl>, Genre_Sports <dbl>, Genre_Strategy <dbl>, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

现在,问题来了:当我定义和拟合mlp时,历元将所有nan都作为损失函数和其他度量,即:

nn <- mlp(epochs = 20) %>%
set_engine('keras', verbose = 1, metrics = c("mae"), optimizer = 'adam', loss = 'mean_absolute_error') %>%
set_mode('regression')
nnwf <- workflow() %>%
add_model(nn) %>%
add_recipe(vgames_recipe)
nnwf %>% fit(vgames_training)

产生

...
Epoch 16/20
389/389 [==============================] - 1s 1ms/step - loss: nan - mae: nan
Epoch 17/20
389/389 [==============================] - 1s 1ms/step - loss: nan - mae: nan
Epoch 18/20
389/389 [==============================] - 1s 2ms/step - loss: nan - mae: nan
Epoch 19/20
389/389 [==============================] - 1s 2ms/step - loss: nan - mae: nan
Epoch 20/20
389/389 [==============================] - 1s 1ms/step - loss: nan - mae: nan

我已经环顾四周,试图在其他方面进行规范化,降低学习率(在mlp((函数和set_engine规范中(,并完全删除日期列。这些都不起作用,我很难弄清楚是什么。以前有人遇到过这个问题吗?

原始Year列中缺少数据,缺少数据会生成缺少的统计信息。

相关内容

  • 没有找到相关文章

最新更新