我在R中使用Keras和TensorFlow。为了规范化数据,我编写了以下函数
df=data.frame(
Column_1=seq(1,10),
Column_2=seq(11,20))
normalization_data <- function(df){
data=(df$Column_1-mean(df$Column_1))/sd(df$Column_1)
return(data)
}
这个函数工作得很好,给了我很好的结果,但只适用于一列。您可以在下面看到结果
normalized_data<- normalization_data(df)
normalized_data
[1] -1.4863011 -1.1560120 -0.8257228 -0.4954337 -0.1651446 0.1651446 0.4954337 0.8257228 1.1560120
[10] 1.4863011
那么,有人能帮助如何更改这个函数,以便处理数据帧中的所有其他列吗?
这个功能都包含在基本R中。假设我们有一个数字矩阵/数据帧:
tst <- matrix(rnorm(n <- 30, 5), nrow = n/3)
tst
#> [,1] [,2] [,3]
#> [1,] 4.829019 5.512795 5.079292
#> [2,] 5.936711 5.609443 5.365723
#> [3,] 5.532721 5.933239 5.419975
#> [4,] 4.857640 3.771058 6.097233
#> [5,] 4.616931 6.160274 3.499183
#> [6,] 4.639770 6.391719 2.508426
#> [7,] 6.710327 4.295331 5.860477
#> [8,] 4.110706 5.283023 5.222424
#> [9,] 5.422139 4.204371 5.724206
#> [10,] 4.413287 6.060544 5.768139
现在缩放该对象:
tst_scaled <- scale(tst)
tst_scaled
#> [,1] [,2] [,3]
#> [1,] -0.3519527 0.20749667 0.02159387
#> [2,] 1.0508778 0.31270353 0.27115886
#> [3,] 0.5392462 0.66517525 0.31842816
#> [4,] -0.3157060 -1.68848936 0.90851618
#> [5,] -0.6205499 0.91231582 -1.35513839
#> [6,] -0.5916256 1.16425789 -2.21837548
#> [7,] 2.0306182 -1.11778652 0.70223277
#> [8,] -1.2616555 -0.04262486 0.14630372
#> [9,] 0.3992004 -1.21680202 0.58350100
#> [10,] -0.8784529 0.80375360 0.62177933
#> attr(,"scaled:center")
#> [1] 5.106925 5.322180 5.054508
#> attr(,"scaled:scale")
#> [1] 0.7896126 0.9186446 1.1477238
相反,取消缩放该对象:
tst_unscaled <- tst_scaled %*%
diag(attr(tst_scaled, "scaled:scale")) +
matrix(
rep(attr(tst_scaled, "scaled:center"), nrow(tst_scaled)),
ncol = ncol(tst_scaled),
byrow = TRUE
)
tst_unscaled
#> [,1] [,2] [,3]
#> [1,] 4.829019 5.512795 5.079292
#> [2,] 5.936711 5.609443 5.365723
#> [3,] 5.532721 5.933239 5.419975
#> [4,] 4.857640 3.771058 6.097233
#> [5,] 4.616931 6.160274 3.499183
#> [6,] 4.639770 6.391719 2.508426
#> [7,] 6.710327 4.295331 5.860477
#> [8,] 4.110706 5.283023 5.222424
#> [9,] 5.422139 4.204371 5.724206
#> [10,] 4.413287 6.060544 5.768139
确认我们的原始和未缩放对象相同:
identical(tst_unscaled, tst)
#> [1] TRUE
函数可以更改为
normalization_vec <- function(x) {
(x - mean(x))/sd(x)
}
然后使用
data.frame(lapply(df, normalization_vec))
Column_1 Column_2
1 -1.4863011 -1.4863011
2 -1.1560120 -1.1560120
3 -0.8257228 -0.8257228
4 -0.4954337 -0.4954337
5 -0.1651446 -0.1651446
6 0.1651446 0.1651446
7 0.4954337 0.4954337
8 0.8257228 0.8257228
9 1.1560120 1.1560120
10 1.4863011 1.4863011
或者可以将代码封装在单个函数中
normalization_data <- function(dat) {
data.frame( lapply(dat, function(x) {
(x - mean(x))/sd(x)
}))
}
normalization_data(df)