我第一次在神经网络上工作,我正在尝试转换"评级"。将数字作为一个单热变量。然而,我不确定如何正确地做到这一点或将其实现到我的模型中。如您所见,评分范围从1-3,我通过以下方式将评分二进制化:
data <- data %>% mutate(Ratings_1 = ifelse(Ratings=='1', 1, 0),
Ratings_2 = ifelse(Ratings=='2', 1, 0),
Ratings_3 = ifelse(Ratings=='3', 1, 0))
如您所见,评分范围从1-3。我希望这是二进制的,但保持收视率1-3。
我现在想知道如何把Ratings_1,Ratings_2,Ratings_3变成一个数字变量(评级,但现在二进制与三个不同的选项),所以我可以使用它作为我的因变量,或者如果这是必要的在一个神经网络?我的NN的目标是预测手机游戏数据中的评级类别("低"评级或"高"评级)。如果这个问题很抽象,我很抱歉。我对NN很陌生。
structure(list(AUR = c(4, 3.5, 3, 3.5, 3.5, 3), URC = c(3553,
284, 8376, 190394, 28, 47), Price = c(2.99, 1.99, 0, 0, 2.99,
0), Size = c(15853568, 12328960, 674816, 21552128, 34689024,
48672768), HasSubtitle = c(0, 0, 0, 0, 0, 1), InAppSum = c(0,
0, 0, 0, 0, 1.99), InAppMin = c(0, 0, 0, 0, 0, 1.99), InAppMax = c(0,
0, 0, 0, 0, 1.99), InAppCount = c(0, 0, 0, 0, 0, 1), InAppAvg = c(0,
0, 0, 0, 0, 1.99), descriptionTermCount = c(263, 204, 97, 272,
365, 368), LanguagesCount = c(17, 1, 1, 17, 15, 1), EngSupported = c(2,
2, 2, 2, 2, 2), GenreCount = c(2, 2, 2, 2, 3, 3), months = c(7,
7, 7, 7, 7, 7), monthsSinceUpdate = c(29, 17, 25, 29, 15, 6),
GameFree = c(0, 0, 0, 0, 0, 1), Ratings = c(3, 3, 3, 3, 2,
3), Ratings_1 = c(0, 0, 0, 0, 0, 0), Ratings_2 = c(0, 0,
0, 0, 1, 0), Ratings_3 = c(1, 1, 1, 1, 0, 1)), row.names = c(NA,
6L), class = "data.frame")
data <- dff
data2 <- mutate_if(data, is.factor,as.numeric)
data3 <- lapply(data2, function(x) as.numeric(as.character(x)))
data <- data.frame(data3)
data <- data %>% mutate(Ratings_1 = ifelse(Ratings=='1', 1, 0),
Ratings_2 = ifelse(Ratings=='2', 1, 0),
Ratings_3 = ifelse(Ratings=='3', 1, 0))
data$ID <- NULL
data$AgeRating <- NULL
n <- neuralnet(Ratings~AUR+URC+Price+Size+HasSubtitle+InAppSum+InAppMin+InAppMax
+InAppCount+InAppAvg+descriptionTermCount+LanguagesCount+EngSupported+GenreCount
+months+monthsSinceUpdate+GameFree,
data = data,
hidden = c(5,2),
startweights = NULL,
linear.output = F,
lifesign = 'full',
rep=1)
您可以使用max.col
函数:
dat$Ratings <- max.col(dat[,startsWith(names(dat),"Ratings")])
dat
Ratings1 Ratings2 Ratings3 Ratings
1 1 0 0 1
2 1 0 0 1
3 1 0 0 1
4 1 0 0 1
5 0 1 0 2
6 0 1 0 2
7 0 1 0 2
8 1 0 0 1
9 0 0 1 3
10 0 0 1 3
数据:
dat <- structure(list(Ratings1 = c(1, 1, 1, 1, 0, 0, 0, 1, 0, 0), Ratings2 = c(0,
0, 0, 0, 1, 1, 1, 0, 0, 0), Ratings3 = c(0, 0, 0, 0, 0, 0, 0,
0, 1, 1)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10"))