单热编码r

  • 本文关键字:编码 单热 r
  • 更新时间 :
  • 英文 :


我第一次在神经网络上工作,我正在尝试转换"评级"。将数字作为一个单热变量。然而,我不确定如何正确地做到这一点或将其实现到我的模型中。如您所见,评分范围从1-3,我通过以下方式将评分二进制化:

data <- data %>% mutate(Ratings_1 = ifelse(Ratings=='1', 1, 0),
Ratings_2 = ifelse(Ratings=='2', 1, 0),
Ratings_3 = ifelse(Ratings=='3', 1, 0))

如您所见,评分范围从1-3。我希望这是二进制的,但保持收视率1-3。

我现在想知道如何把Ratings_1,Ratings_2,Ratings_3变成一个数字变量(评级,但现在二进制与三个不同的选项),所以我可以使用它作为我的因变量,或者如果这是必要的在一个神经网络?我的NN的目标是预测手机游戏数据中的评级类别("低"评级或"高"评级)。如果这个问题很抽象,我很抱歉。我对NN很陌生。

structure(list(AUR = c(4, 3.5, 3, 3.5, 3.5, 3), URC = c(3553, 
284, 8376, 190394, 28, 47), Price = c(2.99, 1.99, 0, 0, 2.99, 
0), Size = c(15853568, 12328960, 674816, 21552128, 34689024, 
48672768), HasSubtitle = c(0, 0, 0, 0, 0, 1), InAppSum = c(0, 
0, 0, 0, 0, 1.99), InAppMin = c(0, 0, 0, 0, 0, 1.99), InAppMax = c(0, 
0, 0, 0, 0, 1.99), InAppCount = c(0, 0, 0, 0, 0, 1), InAppAvg = c(0, 
0, 0, 0, 0, 1.99), descriptionTermCount = c(263, 204, 97, 272, 
365, 368), LanguagesCount = c(17, 1, 1, 17, 15, 1), EngSupported = c(2, 
2, 2, 2, 2, 2), GenreCount = c(2, 2, 2, 2, 3, 3), months = c(7, 
7, 7, 7, 7, 7), monthsSinceUpdate = c(29, 17, 25, 29, 15, 6), 
GameFree = c(0, 0, 0, 0, 0, 1), Ratings = c(3, 3, 3, 3, 2, 
3), Ratings_1 = c(0, 0, 0, 0, 0, 0), Ratings_2 = c(0, 0, 
0, 0, 1, 0), Ratings_3 = c(1, 1, 1, 1, 0, 1)), row.names = c(NA, 
6L), class = "data.frame")
data <- dff
data2 <- mutate_if(data, is.factor,as.numeric) 
data3 <- lapply(data2, function(x) as.numeric(as.character(x)))
data <- data.frame(data3)
data <- data %>% mutate(Ratings_1 = ifelse(Ratings=='1', 1, 0),
Ratings_2 = ifelse(Ratings=='2', 1, 0),
Ratings_3 = ifelse(Ratings=='3', 1, 0))
data$ID <- NULL
data$AgeRating <- NULL
n <- neuralnet(Ratings~AUR+URC+Price+Size+HasSubtitle+InAppSum+InAppMin+InAppMax
+InAppCount+InAppAvg+descriptionTermCount+LanguagesCount+EngSupported+GenreCount
+months+monthsSinceUpdate+GameFree,
data = data,
hidden = c(5,2),
startweights = NULL,
linear.output = F,
lifesign = 'full',
rep=1)

您可以使用max.col函数:

dat$Ratings <- max.col(dat[,startsWith(names(dat),"Ratings")])
dat  
Ratings1 Ratings2 Ratings3 Ratings
1         1        0        0       1
2         1        0        0       1
3         1        0        0       1
4         1        0        0       1
5         0        1        0       2
6         0        1        0       2
7         0        1        0       2
8         1        0        0       1
9         0        0        1       3
10        0        0        1       3

数据:

dat <- structure(list(Ratings1 = c(1, 1, 1, 1, 0, 0, 0, 1, 0, 0), Ratings2 = c(0, 
0, 0, 0, 1, 1, 1, 0, 0, 0), Ratings3 = c(0, 0, 0, 0, 0, 0, 0, 
0, 1, 1)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10"))

最新更新