im 只需使用 dep_delay (dep_delay>30( 作为 H2O 中的预测因子即可计算测试数据的预测准确性
我首先指定响应:
response <- "late_arrival"
比我指定预测器:
predictors <- filter(flights, flights$dep_delay>30)
比我用公式来计算GLM:
> flights_test_delay_glm <- h2o.glm(training_frame=flights_test, x=predictors, y=response, family="binomial")
我收到此错误:
Error in .verify_dataxy(training_frame, x, y) :
`x` must be column names or indices
我确实交叉检查了预测变量值,它们看起来不错:
summary(predictors)
X year month day dep_time
Min. : 86 Min. :2013 Min. : 1.000 Min. : 1.00 Min. : 1
1st Qu.:103457 1st Qu.:2013 1st Qu.: 4.000 1st Qu.: 9.00 1st Qu.:1428
Median :186217 Median :2013 Median : 6.000 Median :16.00 Median :1755
Mean :178012 Mean :2013 Mean : 6.372 Mean :15.79 Mean :1676
3rd Qu.:253087 3rd Qu.:2013 3rd Qu.: 9.000 3rd Qu.:23.00 3rd Qu.:2028
Max. :336764 Max. :2013 Max. :12.000 Max. :31.00 Max. :2400
sched_dep_time dep_delay arr_time sched_arr_time arr_delay
Min. : 500 Min. : 31.00 Min. : 1 Min. : 1 Min. : -42.00
1st Qu.:1334 1st Qu.: 44.00 1st Qu.:1308 1st Qu.:1457 1st Qu.: 39.00
Median :1645 Median : 66.00 Median :1841 Median :1841 Median : 65.00
Mean :1581 Mean : 86.82 Mean :1598 Mean :1730 Mean : 83.29
3rd Qu.:1910 3rd Qu.: 107.00 3rd Qu.:2134 3rd Qu.:2112 3rd Qu.: 108.00
Max. :2359 Max. :1301.00 Max. :2400 Max. :2359 Max. :1272.00
NA's :216 NA's :386
carrier flight tailnum origin dest
EV :11655 Min. : 1.0 N15910 : 84 EWR:19914 ORD : 2653
B6 : 8411 1st Qu.: 619.5 N258JB : 79 JFK:15241 ATL : 2268
UA : 7617 Median :1692.0 N14573 : 78 LGA:13136 BOS : 1840
DL : 4982 Mean :2250.0 N15980 : 77 MCO : 1814
MQ : 3730 3rd Qu.:4100.0 N725MQ : 77 SFO : 1733
AA : 3537 Max. :8500.0 N12921 : 76 FLL : 1708
(Other): 8359 (Other):47820 (Other):36275
air_time distance hour minute
Min. : 20.0 Min. : 80.0 Min. : 5.00 Min. : 0.00
1st Qu.: 77.0 1st Qu.: 483.0 1st Qu.:13.00 1st Qu.:10.00
Median :120.0 Median : 762.0 Median :16.00 Median :29.00
Mean :140.7 Mean : 971.2 Mean :15.54 Mean :27.57
3rd Qu.:171.0 3rd Qu.:1134.0 3rd Qu.:19.00 3rd Qu.:45.00
Max. :666.0 Max. :4983.0 Max. :23.00 Max. :59.00
NA's :386
time_hour
2013-08-08 19:00:00: 52
2013-08-08 17:00:00: 51
2013-07-22 17:00:00: 49
2013-03-08 17:00:00: 48
2013-06-25 17:00:00: 48
2013-07-28 19:00:00: 48
(Other) :47995
我需要帮助来了解我是否错误地编码为预测器值,因为它只是说我需要使用 dep_delay,大于 30 作为预测器。谢谢!
x 参数接受列名或索引的列表(或向量(。检查预测变量的数据类型,以验证您传递的是名称向量还是数据帧。您可以在此处查看如何使用此参数的示例。