Keras在R中的数据标准化



我在R中使用Keras和TensorFlow。为了规范化数据,我编写了以下函数

df=data.frame(
Column_1=seq(1,10),
Column_2=seq(11,20))

normalization_data <- function(df){
data=(df$Column_1-mean(df$Column_1))/sd(df$Column_1)
return(data)
}

这个函数工作得很好,给了我很好的结果,但只适用于一列。您可以在下面看到结果

normalized_data<-  normalization_data(df)
normalized_data
[1] -1.4863011 -1.1560120 -0.8257228 -0.4954337 -0.1651446  0.1651446  0.4954337  0.8257228  1.1560120
[10]  1.4863011

那么,有人能帮助如何更改这个函数,以便处理数据帧中的所有其他列吗?

这个功能都包含在基本R中。假设我们有一个数字矩阵/数据帧:

tst <- matrix(rnorm(n <- 30, 5), nrow = n/3)
tst
#>           [,1]     [,2]     [,3]
#>  [1,] 4.829019 5.512795 5.079292
#>  [2,] 5.936711 5.609443 5.365723
#>  [3,] 5.532721 5.933239 5.419975
#>  [4,] 4.857640 3.771058 6.097233
#>  [5,] 4.616931 6.160274 3.499183
#>  [6,] 4.639770 6.391719 2.508426
#>  [7,] 6.710327 4.295331 5.860477
#>  [8,] 4.110706 5.283023 5.222424
#>  [9,] 5.422139 4.204371 5.724206
#> [10,] 4.413287 6.060544 5.768139

现在缩放该对象:

tst_scaled <- scale(tst)
tst_scaled
#>             [,1]        [,2]        [,3]
#>  [1,] -0.3519527  0.20749667  0.02159387
#>  [2,]  1.0508778  0.31270353  0.27115886
#>  [3,]  0.5392462  0.66517525  0.31842816
#>  [4,] -0.3157060 -1.68848936  0.90851618
#>  [5,] -0.6205499  0.91231582 -1.35513839
#>  [6,] -0.5916256  1.16425789 -2.21837548
#>  [7,]  2.0306182 -1.11778652  0.70223277
#>  [8,] -1.2616555 -0.04262486  0.14630372
#>  [9,]  0.3992004 -1.21680202  0.58350100
#> [10,] -0.8784529  0.80375360  0.62177933
#> attr(,"scaled:center")
#> [1] 5.106925 5.322180 5.054508
#> attr(,"scaled:scale")
#> [1] 0.7896126 0.9186446 1.1477238

相反,取消缩放该对象:

tst_unscaled <- tst_scaled %*% 
diag(attr(tst_scaled, "scaled:scale")) + 
matrix(
rep(attr(tst_scaled, "scaled:center"), nrow(tst_scaled)), 
ncol = ncol(tst_scaled), 
byrow = TRUE
)
tst_unscaled
#>           [,1]     [,2]     [,3]
#>  [1,] 4.829019 5.512795 5.079292
#>  [2,] 5.936711 5.609443 5.365723
#>  [3,] 5.532721 5.933239 5.419975
#>  [4,] 4.857640 3.771058 6.097233
#>  [5,] 4.616931 6.160274 3.499183
#>  [6,] 4.639770 6.391719 2.508426
#>  [7,] 6.710327 4.295331 5.860477
#>  [8,] 4.110706 5.283023 5.222424
#>  [9,] 5.422139 4.204371 5.724206
#> [10,] 4.413287 6.060544 5.768139

确认我们的原始和未缩放对象相同:

identical(tst_unscaled, tst)
#> [1] TRUE

函数可以更改为

normalization_vec <- function(x) {
(x - mean(x))/sd(x)
}

然后使用

data.frame(lapply(df, normalization_vec))
Column_1   Column_2
1  -1.4863011 -1.4863011
2  -1.1560120 -1.1560120
3  -0.8257228 -0.8257228
4  -0.4954337 -0.4954337
5  -0.1651446 -0.1651446
6   0.1651446  0.1651446
7   0.4954337  0.4954337
8   0.8257228  0.8257228
9   1.1560120  1.1560120
10  1.4863011  1.4863011

或者可以将代码封装在单个函数中

normalization_data <- function(dat) {
data.frame( lapply(dat, function(x) {
(x - mean(x))/sd(x)
}))
}
normalization_data(df)

最新更新