r-从我的随机森林模型中获得仅NA响应的问题



这是我关于Stake溢出的第一篇文章,如果需要更多信息,请问我以下问题!

情况:我整理了Maritimes(加拿大大西洋(淡水生态系统的水化学数据,因为我正试图使用入侵物种的随机森林模型(RFM(创建一个预测物种分布的模型。不幸的是,加拿大大西洋缺乏一致的水监测计划,现有的监测计划也没有像其他小组一样监测相同的参数。因此,我的数据库(包括培训和测试(有许多NA。

问题:这是我从RFM:得到的回复

> p1 <- predict(model2, newdata=Test_Dataset,type="prob")[,2]
> p1

1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17 19 20 21 23 24 25 26 28 29 30 31 32 33 34不,不,不35 36 37NA NA NA

我尝试过的:

  1. 我使用各种预测因子构建了RFM(即模型2(。我确实包括:

    型号2<-randomForest(CMS ~ Lat+Lon+pH+碱度+Ca+硬度+DO+TOC+T_P+T_N+Cond+Na+No_Stocking+No_Fish_Species+Dist_Hwy+No_Boat_Launches+Connected_Lakes+Invasives,重要性=TRUE,数据=TrainSet,Na.action=Na.roughfix(型号2

**注意,变量的大列表是预测因素,CMS是物种。

  • 我尝试将测试数据集(test_dataset(与训练数据集(Validation_dataset(进行匹配。

    测试数据集<-rbind(验证数据集[1,],验证数据集(测试数据集<-测试数据集[-1,]

  • 我搜索并阅读了多个资源(包括明显的R页面和链接到那里的参考资料(。

  • 我对数据帧进行了如下突变(我只显示Validation_Dataset,因为两者的突变相同(:

    突变数据集以修复R读取NA细胞的问题

    验证数据集<-验证数据集%>%dplyr::突变(#将年份转换为分类变量年=因子(年(,#将叶绿素浓度从字符文件转换为数字文件#在适当的时候将"NA"转换为缺失值数据叶绿素=dplyr::na_if(叶绿素,"na"(,叶绿素=因子(叶绿素(,硬度=dplyr::na_if(硬度,"na"(,硬度=系数(硬度(,碱度=dplyr::na_if(碱度,"na"(,碱度=因子(碱度(,Ca=dplyr::na_if(Ca,"na"(,Ca=因子(Ca(,TOC=dplyr::na_if(TOC,"na"(,TOC=因子(TOC(,Cond=dplyr::na_if(Cond,"na"(,Cond=因子(Cond(,Na=dplyr::Na_if(Na,"Na"(,Na=因子(Cond(,NH4=dplyr::na_if(NH4,"na"(,NH4=因子(NH4(,NO3=dplyr::na_if(NO3,"na"(,NO3=因子(NO3(,pH=dplyr::na_if(pH,"na"(,pH=因子(pH(,T_N=dplyr::na_if(T_N,"na"(,T_ N=因子(T_N(,T_P=dplyr::na_if(T_P,"na"(,T_ P=因子(T_P(,DO=dplyr::na_if(DO,"na"(,DO=因子(DO(,盐度=dplyr::na_if(盐度,"na"(,盐度=因子(盐度(,No_Stocking=dplyr::na_if(No_Stocking,"na"(,No_ Stocking=因子(No_Stocking(,No_Fish_Species=dplyr::na_if(No_Fish_Stpecies,"na"(,No_Fish_Species=因子(No_Fish_Stpecies(,Dist_Hwy=dplyr::na_if(Dist_Hwy,"na"(,Dist_Hwy=因子(Dist_Hwy(,No_Boat_Launche=dplyr::na_if(No_Boat_Launches,"na"(,发射次数=因子(发射次数(,Connected_Lakes=dplyr::na_if(Connected_Lates,"na"(,Connected_Lakes=因子(Connected_Lates(,Invasives=dplyr::na_if(Invasives,"na"(,Invasives=因子(Invasives(,Lat=因子(Lat(,Lon=因子(Lon(,CMS=因子(CMS(

  • 问题:有人知道如何真正使编码工作,以便model2在Test_Dataset上进行预测吗?我认为这个问题实际上可能很小,但我没有看到

    以下是训练数据集(Validation_dataset(的一瞥:

    > str(Validation_Dataset)
    Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    37 obs. of  31 variables:
    $ Name            : chr  "Canard River" "Cedar Creek" "Holland River" "Speed River" ...
    $ STN #/COUNTY    : chr  "10000200202" "16001800202" "3007700202" "16018403402" ...
    $ Province        : chr  "ON" "ON" "ON" "ON" ...
    $ Lat             : Factor w/ 37 levels "42.03204214",..: 2 1 11 9 10 8 7 5 6 3 ...
    $ Lon             : Factor w/ 37 levels "-83.01879548",..: 1 2 11 8 10 6 7 9 5 4 ...
    $ Year            : Factor w/ 9 levels "2007, 2011","2010, 2015, 2011",..: 8 8 8 8 8 8 8 8 8 8 ...
    $ Month           : chr  "4" "4" "4" "4" ...
    $ Day             : chr  "11" "12" "26" "27" ...
    $ Data Source     : chr  "ON Provincial (Streams) Water Quality Monitoring Network" "ON Provincial (Streams) Water Quality Monitoring Network" "ON Provincial (Streams) Water Quality Monitoring Network" "ON Provincial (Streams) Water Quality Monitoring Network" ...
    $ pH              : Factor w/ 35 levels "6.073333","6.13",..: 18 21 28 29 25 34 30 32 19 26 ...
    $ Alkalinity      : Factor w/ 31 levels "1.8","2.8","3.933333333",..: 19 22 31 30 27 NA NA 26 NA 21 ...
    $ Hardness        : Factor w/ 13 levels "14.8","36.8",..: 7 8 11 10 9 NA NA 13 NA NA ...
    $ Ca              : Factor w/ 24 levels "3.833333333",..: 18 19 24 20 21 NA NA 22 NA NA ...
    $ Chlorophyll     : Factor w/ 15 levels "0.423601","0.453791",..: NA NA NA NA NA NA NA NA NA NA ...
    $ DO              : Factor w/ 26 levels "0.27","6.2","6.96",..: 21 24 18 16 4 25 17 14 2 7 ...
    $ TOC             : Factor w/ 3 levels "4.8","5.5","8.8": NA NA NA NA NA NA NA NA NA NA ...
    $ T_P             : Factor w/ 24 levels "0.002","0.003",..: 23 22 18 10 15 14 16 13 21 20 ...
    $ T_N             : Factor w/ 32 levels "0.006","0.13",..: 30 31 27 28 17 29 24 32 21 25 ...
    $ NO3+NO2         : num  2.173 2.292 1.092 1.695 0.426 ...
    $ NO3             : Factor w/ 32 levels "0.027","0.035",..: 30 31 26 27 11 29 24 32 22 8 ...
    $ NH4             : Factor w/ 27 levels "0.005","0.006",..: 26 25 22 17 9 11 13 19 23 27 ...
    $ Cond            : Factor w/ 34 levels "41","97","134",..: 24 21 29 23 22 14 34 31 21 17 ...
    $ Salinity        : Factor w/ 9 levels "0.11","0.15",..: NA NA NA NA NA NA NA NA NA NA ...
    $ Na              : Factor w/ 34 levels "41","97","134",..: 24 21 29 23 22 14 34 31 21 17 ...
    $ No_Stocking     : Factor w/ 3 levels "0","1","2": 1 2 2 3 1 2 1 2 1 2 ...
    $ No_Fish_Species : Factor w/ 9 levels "0","1","2","3",..: 1 4 6 4 1 5 1 9 1 9 ...
    $ Dist_Hwy        : Factor w/ 16 levels "0.003","0.006",..: NA NA 16 NA NA NA NA 8 NA 5 ...
    $ No_Boat_Launches: Factor w/ 8 levels "0","1","2","3",..: 1 1 5 1 1 1 1 8 1 3 ...
    $ Connected_Lakes : Factor w/ 11 levels "0","1","2","3",..: 7 2 3 4 9 6 2 3 2 5 ...
    $ Invasives       : Factor w/ 3 levels "0","1","2": NA NA NA NA NA NA NA NA NA NA ...
    $ CMS             : Factor w/ 2 levels "NO","YES": 2 2 2 2 2 2 2 2 2 2 ...
    

    使用参数na.roughfix。如果要使用它,必须首先在randomForest函数外部指定。我将以虹膜数据集为例。

    iris.roughfix <- na.roughfix(iris.na)
    iris.narf <- randomForest(Species ~ ., iris.na, na.action=na.roughfix)
    

    最新更新