我正在用Fraud Transaction数据学习R。当我尝试使用ROSE来处理不平衡的数据集时,会弹出only handle continuous and categorical variables
错误。
这是我尝试的:
str(dataset)
'data.frame': 6362620 obs. of 13 variables:
$ step : int 1 1 1 1 1 1 1 1 1 1 ...
$ type : chr "PAYMENT" "PAYMENT" "TRANSFER" "CASH_OUT" ...
$ amount : num 9840 1864 181 181 11668 ...
$ nameOrig : chr "C1231006815" "C1666544295" "C1305486145" "C840083671" ...
$ oldbalanceOrg : num 170136 21249 181 181 41554 ...
$ newbalanceOrig : num 160296 19385 0 0 29886 ...
$ nameDest : chr "M1979787155" "M2044282225" "C553264065" "C38997010" ...
$ oldbalanceDest : num 0 0 0 21182 0 ...
$ newbalanceDest : num 0 0 0 0 0 ...
$ isFraud : int 0 0 1 1 0 0 0 0 0 0 ...
$ isFlaggedFraud : int 0 0 0 0 0 0 0 0 0 0 ...
$ balancedOfOrigin: num -9840 -1864 -181 -181 -11668 ...
$ balancedOfDest : num 0 0 0 21182 0 ...
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data
与错误:
玫瑰错误。采样(n, n, p, n . major, major, n . minor, n . minor)minoY, y, classy,: ROSE句柄的当前实现只有连续变量和分类变量。
调试:
# change the isFraud attribute into category 0/1
dataset$isFraud = as.factor(dataset$isFraud)
datadata_ROSE <- ROSE(isFraud~., data = dataset, N = 500, seed = 111)$data
最后,错误仍然无法解决。如何使数据集适合ROSE模型?
str
部分可以看出,type
,nameOrig
,nameDest
仍性格因素。它会把它们变成因子。但是当我看到nameOrig
和nameDest
时,它似乎不适合包含在ROSE
中。
dummy2 <- head(dataset, 100)
dummy2$isFraud = as.factor(dummy2$isFraud)
#additional part.
dummy2 <- dummy2 %>%
mutate(type = factor(type),
nameDest = factor(nameDest),
nameOrig = factor(nameOrig))
dummy3 <- ROSE(isFraud~., data = dummy2, N = 500, seed = 111)$data