Tensorflow Keras 的 Glorot 正态初始值设定项的平均值不为零



根据Glorot Normal的文档,Initial WeightsNormal Distributionmean应该是zero

从以 0 为中心的截断正态分布中抽取样本

但它似乎并不zero,我错过了什么吗?

请在下面找到代码:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
print(tf.__version__)
initializer = tf.keras.initializers.GlorotNormal(seed = 1234)

model = Sequential([Dense(units = 3, input_shape = [1], kernel_initializer = initializer,
bias_initializer = initializer),
Dense(units = 1, kernel_initializer = initializer,
bias_initializer = initializer)])
batch_size = 1
x = np.array([-1.0, 0, 1, 2, 3, 4.0], dtype = 'float32')
y = np.array([-3, -1.0, 1, 3.0, 5.0, 7.0], dtype = 'float32')
x = np.reshape(x, (-1, 1))
# Prepare the training dataset.
train_dataset = tf.data.Dataset.from_tensor_slices((x, y))
train_dataset = train_dataset.shuffle(buffer_size=64).batch(batch_size)
epochs = 1
learning_rate=1e-3
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate)
for epoch in range(epochs):
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)  # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = tf.keras.losses.MSE(y_batch_train, logits)               


Initial_Weights_1st_Hidden_Layer = model.trainable_weights[0]

Mean_Weights_Hidden_Layer = tf.reduce_mean(Initial_Weights_1st_Hidden_Layer)

Initial_Weights_Output_Layer = model.trainable_weights[2]

Mean_Weights_Output_Layer = tf.reduce_mean(Initial_Weights_Output_Layer)  

Initial_Bias_1st_Hidden_Layer = model.trainable_weights[1]

Mean_Bias_Hidden_Layer = tf.reduce_mean(Initial_Bias_1st_Hidden_Layer)      

Initial_Bias_Output_Layer = model.trainable_weights[3]

Mean_Bias_Output_Layer = tf.reduce_mean(Initial_Bias_Output_Layer)

if epoch ==0 and step==0:

print('n Initial Weights of First-Hidden Layer = ', Initial_Weights_1st_Hidden_Layer)
print('n Mean of Weights of Hidden Layer = %s' %Mean_Weights_Hidden_Layer.numpy())

print('n Initial Weights of Second-Hidden/Output Layer = ', Initial_Weights_Output_Layer)
print('n Mean of Weights of Output Layer = %s' %Mean_Weights_Output_Layer.numpy())

print('n Initial Bias of First-Hidden Layer = ', Initial_Bias_1st_Hidden_Layer)
print('n Mean of Bias of Hidden Layer = %s' %Mean_Bias_Hidden_Layer.numpy())
print('n Initial Bias of Second-Hidden/Output Layer = ', Initial_Bias_Output_Layer)
print('n Mean of Bias of Output Layer = %s' %Mean_Bias_Output_Layer.numpy())

因为您不会从该分布中抽取太多样本。

initializer = tf.keras.initializers.GlorotNormal(seed = 1234)
mean = tf.reduce_mean(initializer(shape=(1, 3))).numpy()
print(mean) # -0.29880756

但是,如果您增加样本:

initializer = tf.keras.initializers.GlorotNormal(seed = 1234)
mean = tf.reduce_mean(initializer(shape=(1, 500))).numpy()
print(mean) # 0.003004579

同样的事情也适用于您的示例。如果将第一个密集层的单位增加到 500,您应该会看到具有相同种子的0.003004579