r语言 - if (any(co)) {: valor ausente donde TRUE/FALSE是必要的



我一直在训练一些模型,当我尝试使用径向基函数核支持向量机时,我得到以下错误:

> svmRFit <- train(x = Fraud_trainX, 
+                  y = Fraud_trainY, 
+                  method = "svmRadial",
+                  metric = "ROC",
+                  preProc = c("center", "scale"),
+                  tuneLength = 15,
+                  trControl = ctrl)
Error in if (any(co)) { : valor ausente donde TRUE/FALSE es necesario
Además: Warning messages:
1: In FUN(newX[, i], ...) : NAs introducidos por coerción
2: In FUN(newX[, i], ...) : NAs introducidos por coerción
3: In FUN(newX[, i], ...) : NAs introducidos por coerción
4: In FUN(newX[, i], ...) : NAs introducidos por coerción
5: In FUN(newX[, i], ...) : NAs introducidos por coerción
Called from: .local(x, ...)
Browse[1]>

这是我的数据库的摘要:

summary(Fraud_trainX)
Make      AccidentArea                PolicyType   VehicleCategory
Pontiac  :1412   Rural: 597   SedC                :2109   Sedan  :3660   
Toyota   :1177   Urban:5186   SedL                :1857   Sport  :1994   
Honda    :1054                SedA                :1551   Utility: 129   
Mazda    : 883                SpoC                : 126                  
Chevrolet: 637                Utility - All Perils: 113                  
Accura   : 183                UtiCL               :  16                  
(Other)  : 437                (Other)             :  11                  
BasePolicy WeekOfMonthClaimed      Age         PolicyNumber     RepNumber     
AP:1675    Min.   :1.000      Min.   :16.00   Min.   :    2   Min.   : 1.000  
C :2246    1st Qu.:2.000      1st Qu.:31.00   1st Qu.: 3866   1st Qu.: 4.000  
L :1862    Median :3.000      Median :38.00   Median : 7757   Median : 9.000  
Mean   :2.703      Mean   :40.71   Mean   : 7754   Mean   : 8.473  
3rd Qu.:4.000      3rd Qu.:49.00   3rd Qu.:11556   3rd Qu.:12.000  
Max.   :5.000      Max.   :80.00   Max.   :15420   Max.   :16.000  
NA's   :130                                     
Deductible     DriverRating     ClaimSize          Month       
Min.   :400.0   Min.   :1.000   Min.   :     0   Min.   : 1.000  
1st Qu.:400.0   1st Qu.:1.000   1st Qu.:  4112   1st Qu.: 3.000  
Median :400.0   Median :3.000   Median :  8150   Median : 6.000  
Mean   :407.3   Mean   :2.488   Mean   : 22921   Mean   : 6.384  
3rd Qu.:400.0   3rd Qu.:3.000   3rd Qu.: 43446   3rd Qu.: 9.000  
Max.   :700.0   Max.   :4.000   Max.   :141394   Max.   :12.000  
NA's   :4                                        
WeekOfMonth      DayOfWeek     DayOfWeekClaimed  MonthClaimed   
Min.   :1.000   Min.   :1.000   Min.   :1.000    Min.   : 1.000  
1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000    1st Qu.: 3.000  
Median :3.000   Median :4.000   Median :3.000    Median : 6.000  
Mean   :2.776   Mean   :3.844   Mean   :2.824    Mean   : 6.345  
3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.:4.000    3rd Qu.: 9.000  
Max.   :5.000   Max.   :7.000   Max.   :7.000    Max.   :12.000  

Sex         MaritalStatus       Fault         VehiclePrice  
Min.   :0.0000   Min.   :1.000   Min.   :0.0000   Min.   :1.000  
1st Qu.:1.0000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:2.000  
Median :1.0000   Median :2.000   Median :0.0000   Median :2.000  
Mean   :0.8406   Mean   :1.698   Mean   :0.2722   Mean   :2.783  
3rd Qu.:1.0000   3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.:3.000  
Max.   :1.0000   Max.   :3.000   Max.   :1.0000   Max.   :6.000  

Days_Policy_Accident Days_Policy_Claim PastNumberOfClaims  AgeOfVehicle  
Min.   :0.000        Min.   :1.000     Min.   :0.000      Min.   :0.000  
1st Qu.:4.000        1st Qu.:3.000     1st Qu.:0.000      1st Qu.:6.000  
Median :4.000        Median :3.000     Median :1.000      Median :7.000  
Mean   :3.971        Mean   :2.993     Mean   :1.333      Mean   :6.592  
3rd Qu.:4.000        3rd Qu.:3.000     3rd Qu.:2.000      3rd Qu.:8.000  
Max.   :4.000        Max.   :3.000     Max.   :3.000      Max.   :8.000  
    
AgeOfPolicyHolder PoliceReportFiled WitnessPresent      AgentType      
Min.   :1.00      Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
1st Qu.:5.00      1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.00000  
Median :6.00      Median :0.00000   Median :0.00000   Median :0.00000  
Mean   :5.89      Mean   :0.02957   Mean   :0.00536   Mean   :0.01504  
3rd Qu.:7.00      3rd Qu.:0.00000   3rd Qu.:0.00000   3rd Qu.:0.00000  
Max.   :9.00      Max.   :1.00000   Max.   :1.00000   Max.   :1.00000  
  
NumberOfSuppliments AddressChange_Claim  NumberOfCars   
Min.   :0.000       Min.   :0.0000      Min.   :0.0000  
1st Qu.:0.000       1st Qu.:0.0000      1st Qu.:0.0000  
Median :1.000       Median :0.0000      Median :0.0000  
Mean   :1.163       Mean   :0.1757      Mean   :0.1027  
3rd Qu.:2.000       3rd Qu.:0.0000      3rd Qu.:0.0000  
Max.   :3.000       Max.   :3.0000      Max.   :3.0000 
数据库的结构:
str(Fraud_trainX)
'data.frame':   5783 obs. of  32 variables:
$ Make                : Factor w/ 19 levels "Accura","BMW",..: 7 18 6 7 6 6 6 3 10 7 ...
$ AccidentArea        : Factor w/ 2 levels "Rural","Urban": 2 1 2 1 2 2 2 2 2 2 ...
$ PolicyType          : Factor w/ 8 levels "SedA","SedC",..: 5 3 3 2 3 3 1 2 3 2 ...
$ VehicleCategory     : Factor w/ 3 levels "Sedan","Sport",..: 2 2 2 1 2 2 1 1 2 1 ...
$ BasePolicy          : Factor w/ 3 levels "AP","C","L": 2 3 3 2 3 3 1 2 3 2 ...
$ WeekOfMonthClaimed  : num  4 1 3 1 1 5 1 1 1 4 ...
$ Age                 : num  34 65 28 NA 61 38 41 28 40 21 ...
$ PolicyNumber        : num  2 4 13 14 15 16 17 18 21 27 ...
$ RepNumber           : num  15 4 11 12 3 16 15 6 3 1 ...
$ Deductible          : num  400 400 400 400 400 400 400 400 400 400 ...
$ DriverRating        : num  4 2 1 3 1 1 4 1 1 2 ...
$ ClaimSize           : num  59294 7584 59748 82212 59552 ...
$ Month               : int  1 6 1 1 1 8 4 7 4 3 ...
$ WeekOfMonth         : int  3 2 3 5 5 4 4 5 2 3 ...
$ DayOfWeek           : int  3 6 5 5 1 2 4 7 5 4 ...
$ DayOfWeekClaimed    : int  1 5 5 3 4 1 3 3 2 4 ...
$ MonthClaimed        : int  1 7 1 2 2 8 5 8 5 6 ...
$ Sex                 : int  1 1 1 1 1 1 1 0 1 1 ...
$ MaritalStatus       : int  1 2 2 1 2 1 2 2 2 2 ...
$ Fault               : int  0 1 0 1 0 0 0 1 0 0 ...
$ VehiclePrice        : int  6 2 6 6 6 6 6 2 2 3 ...
$ Days_Policy_Accident: int  4 4 4 4 4 4 4 4 4 4 ...
$ Days_Policy_Claim   : int  3 3 3 3 3 3 3 3 3 3 ...
$ PastNumberOfClaims  : int  0 1 1 0 0 0 0 0 1 3 ...
$ AgeOfVehicle        : int  6 8 7 0 8 6 7 7 8 5 ...
$ AgeOfPolicyHolder   : int  5 8 5 1 8 6 6 5 6 4 ...
$ PoliceReportFiled   : int  1 1 0 0 0 0 0 0 0 0 ...
$ WitnessPresent      : int  0 0 0 0 0 0 0 0 0 0 ...
$ AgentType           : int  0 0 0 0 0 0 0 0 0 0 ...
$ NumberOfSuppliments : int  0 3 0 0 0 0 0 1 3 3 ...
$ AddressChange_Claim : int  0 0 0 0 0 0 0 0 0 0 ...
$ NumberOfCars        : int  0 0 0 0 0 0 0 0 0 0 ...

变量respuesta:

summary(Fraud_trainY)
No  Yes 
5440  343 

这里有一点关于我用于模型训练的索引和控制:

indx <- createMultiFolds(Fraud_trainY, k = 5, times = 2)
str(indx)
ctrl <- trainControl(method = "repeatedcv",index = indx, 
summaryFunction = twoClassSummary,
sampling = "up",
classProbs = TRUE)

这里是模型参数:

svmRFit <- train(x = Fraud_trainX, 
y = Fraud_trainY, 
method = "svmRadial",
metric = "ROC",
preProc = c("center", "scale"),
tuneLength = 15,
trControl = ctrl)

我已经尝试加载pROC库,它没有给我任何有利的结果,我已经从所有变量中消除了包含NA的行,响应变量已经具有级别"No"one_answers";Yes"。我还为C5.0 ("C5.0"),神经网络(nnet)和逻辑回归("multinom")做了这个训练,在所有这些数据中,我都为我服务,它给了我模型的结果,这是唯一一个标记我某种错误的模型。

正如@AlvaroMartinez评论的那样,错误是我有变量作为factor,当我将这些变量更改为integer时,模型正常工作。

最新更新