如何解决R中ROSE处理不平衡数据集时错误的变量类型错误?



我正在用Fraud Transaction数据学习R。当我尝试使用ROSE来处理不平衡的数据集时,会弹出only handle continuous and categorical variables错误。

这是我尝试的:

str(dataset)
'data.frame':   6362620 obs. of  13 variables:
$ step            : int  1 1 1 1 1 1 1 1 1 1 ...
$ type            : chr  "PAYMENT" "PAYMENT" "TRANSFER" "CASH_OUT" ...
$ amount          : num  9840 1864 181 181 11668 ...
$ nameOrig        : chr  "C1231006815" "C1666544295" "C1305486145" "C840083671" ...
$ oldbalanceOrg   : num  170136 21249 181 181 41554 ...
$ newbalanceOrig  : num  160296 19385 0 0 29886 ...
$ nameDest        : chr  "M1979787155" "M2044282225" "C553264065" "C38997010" ...
$ oldbalanceDest  : num  0 0 0 21182 0 ...
$ newbalanceDest  : num  0 0 0 0 0 ...
$ isFraud         : int  0 0 1 1 0 0 0 0 0 0 ...
$ isFlaggedFraud  : int  0 0 0 0 0 0 0 0 0 0 ...
$ balancedOfOrigin: num  -9840 -1864 -181 -181 -11668 ...
$ balancedOfDest  : num  0 0 0 21182 0 ...
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data

与错误:

玫瑰错误。采样(n, n, p, n . major, major, n . minor, n . minor)minoY, y, classy,: ROSE句柄的当前实现只有连续变量和分类变量。

调试:

# change the isFraud attribute into category 0/1
dataset$isFraud = as.factor(dataset$isFraud)
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data

最后,错误仍然无法解决。如何使数据集适合ROSE模型?

str部分可以看出,type,nameOrig,nameDest仍性格因素。它会把它们变成因子。但是当我看到nameOrignameDest时,它似乎不适合包含在ROSE中。

dummy2 <- head(dataset, 100)
dummy2$isFraud = as.factor(dummy2$isFraud)
#additional part.
dummy2 <- dummy2 %>%
mutate(type = factor(type),
nameDest = factor(nameDest),
nameOrig = factor(nameOrig))
dummy3 <- ROSE(isFraud~., data = dummy2, N = 500, seed = 111)$data