r语言 - 如何使用 1 个预测变量计算data_test预测



im 只需使用 dep_delay (dep_delay>30( 作为 H2O 中的预测因子即可计算测试数据的预测准确性

我首先指定响应:

response <- "late_arrival"

比我指定预测器:

predictors <- filter(flights, flights$dep_delay>30)

比我用公式来计算GLM:

> flights_test_delay_glm <- h2o.glm(training_frame=flights_test, x=predictors, y=response, family="binomial")

我收到此错误:

Error in .verify_dataxy(training_frame, x, y) : 
  `x` must be column names or indices

我确实交叉检查了预测变量值,它们看起来不错:

summary(predictors)

    X               year          month             day           dep_time   
 Min.   :    86   Min.   :2013   Min.   : 1.000   Min.   : 1.00   Min.   :   1  
 1st Qu.:103457   1st Qu.:2013   1st Qu.: 4.000   1st Qu.: 9.00   1st Qu.:1428  
 Median :186217   Median :2013   Median : 6.000   Median :16.00   Median :1755  
 Mean   :178012   Mean   :2013   Mean   : 6.372   Mean   :15.79   Mean   :1676  
 3rd Qu.:253087   3rd Qu.:2013   3rd Qu.: 9.000   3rd Qu.:23.00   3rd Qu.:2028  
 Max.   :336764   Max.   :2013   Max.   :12.000   Max.   :31.00   Max.   :2400  
 sched_dep_time   dep_delay          arr_time    sched_arr_time   arr_delay      
 Min.   : 500   Min.   :  31.00   Min.   :   1   Min.   :   1   Min.   : -42.00  
 1st Qu.:1334   1st Qu.:  44.00   1st Qu.:1308   1st Qu.:1457   1st Qu.:  39.00  
 Median :1645   Median :  66.00   Median :1841   Median :1841   Median :  65.00  
 Mean   :1581   Mean   :  86.82   Mean   :1598   Mean   :1730   Mean   :  83.29  
 3rd Qu.:1910   3rd Qu.: 107.00   3rd Qu.:2134   3rd Qu.:2112   3rd Qu.: 108.00  
 Max.   :2359   Max.   :1301.00   Max.   :2400   Max.   :2359   Max.   :1272.00  
                                  NA's   :216                   NA's   :386      
    carrier          flight          tailnum      origin           dest      
 EV     :11655   Min.   :   1.0   N15910 :   84   EWR:19914   ORD    : 2653  
 B6     : 8411   1st Qu.: 619.5   N258JB :   79   JFK:15241   ATL    : 2268  
 UA     : 7617   Median :1692.0   N14573 :   78   LGA:13136   BOS    : 1840  
 DL     : 4982   Mean   :2250.0   N15980 :   77               MCO    : 1814  
 MQ     : 3730   3rd Qu.:4100.0   N725MQ :   77               SFO    : 1733  
 AA     : 3537   Max.   :8500.0   N12921 :   76               FLL    : 1708  
 (Other): 8359                    (Other):47820               (Other):36275  
    air_time        distance           hour           minute     
 Min.   : 20.0   Min.   :  80.0   Min.   : 5.00   Min.   : 0.00  
 1st Qu.: 77.0   1st Qu.: 483.0   1st Qu.:13.00   1st Qu.:10.00  
 Median :120.0   Median : 762.0   Median :16.00   Median :29.00  
 Mean   :140.7   Mean   : 971.2   Mean   :15.54   Mean   :27.57  
 3rd Qu.:171.0   3rd Qu.:1134.0   3rd Qu.:19.00   3rd Qu.:45.00  
 Max.   :666.0   Max.   :4983.0   Max.   :23.00   Max.   :59.00  
 NA's   :386                                                     
               time_hour    
 2013-08-08 19:00:00:   52  
 2013-08-08 17:00:00:   51  
 2013-07-22 17:00:00:   49  
 2013-03-08 17:00:00:   48  
 2013-06-25 17:00:00:   48  
 2013-07-28 19:00:00:   48  
 (Other)            :47995  

我需要帮助来了解我是否错误地编码为预测器值,因为它只是说我需要使用 dep_delay,大于 30 作为预测器。谢谢!

x 参数接受列名或索引的列表(或向量(。检查预测变量的数据类型,以验证您传递的是名称向量还是数据帧。您可以在此处查看如何使用此参数的示例。

相关内容

  • 没有找到相关文章

最新更新