假设我们有一个三层神经网络:隐藏在考虑Hidden层和Outputs层之间的权值为:W, b,其中W是形状为(N, M)的矩阵。默认情况下,W和b的所有分量都被设置为keras中可训练的。我知道如何将整个W或b设置为不可训练的,就像下面的链接:
如何设置keras中的参数为不可训练的?
我想要的是能够只设置W的一个特定组件(例如)是不可训练的。例如,If:
W = [[W11, W12]
[W21, W22]]
可以重写为:
W = [W1, W2] with W1 = [W11, W12] and W2 = [W21, W22]
,所有W1和W2都是tf类型。变量,
如何设置例如W1为不可训练的?
我找了一些其他的话题,但没有一个能帮助我得到我想要的。下面是一些链接的例子:
链接1:https://keras.io/guides/transfer_learning/
链接2:https://github.com/tensorflow/tensorflow/issues/47597
谁能帮我解决这个问题?提前谢谢你
张量W存储为单个tf。变量(不是四个变量w11, w12, w21, w22)和tf.Variable.trainable控制整个张量,而不是子张量。更糟糕的是,在keras层中,所有变量都有相同的可训练属性,因为它们是由tf.keras.layers.Layer.trainable属性控制的。
要做你想做的,你需要两个变量W1和W2,每个都包装在一个层的不同实例中。将每一层应用于输入,得到一半的答案。然后你可以连接得到完整的答案。
您可以在keras中创建自己的图层。这将帮助您在图层中自定义权重,例如,它们是否可训练。
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages
import tensorflow as tf
from keras.layers import *
from keras.models import *
# Your custom layer
class Linear(Layer):
def __init__(self, units=32,**kwargs):
super(Linear, self).__init__(**kwargs)
self.units = units
def build(self, input_shape):
self.w = self.add_weight(
shape=(input_shape[-1], self.units),
initializer="random_normal",
trainable=True,
)
self.b = self.add_weight(
shape=(self.units,), initializer="random_normal", trainable=False
)
def call(self, inputs):
return tf.matmul(inputs, self.w) + self.b
在Linear
中,权值w
是可训练的,偏值b
是不可训练的。在这里,我为虚拟数据创建了一个训练循环,以可视化权重更新。
batch_size=10
input_shape=(batch_size,5,5)
## model
model = Sequential()
model.add(Input(shape=input_shape))
model.add(Linear(units=4,name='my_linear_layer'))
model.add(Dense(1))
## dummy dataset
x = tf.random.normal(input_shape) # dummy input
y = tf.ones((batch_size,1)) # dummy output
## loss functions and optimizer
loss_fn = tf.keras.losses.MeanSquaredError()
optimizer = tf.keras.optimizers.SGD(learning_rate=1e-2)
### training loop
epochs = 3
for epoch in range(epochs):
print("nStart of epoch %d" % (epoch,))
tf.print(model.get_layer('my_linear_layer').get_weights())
# Open a GradientTape to record the operations run
# during the forward pass, which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
这个循环返回以下结果,
Start of epoch 0
[array([[ 0.08920084, -0.04294993, 0.06111819, 0.08334437],
[-0.0369432 , -0.05014499, 0.0305218 , -0.07486793],
[-0.01227043, 0.09460627, -0.0560123 , 0.01324316],
[-0.00255878, 0.00214959, -0.02924518, 0.04721532],
[-0.05532415, -0.02014978, -0.06785563, -0.07330619]],
dtype=float32),
array([ 0.02154647, 0.05153348, -0.00128291, -0.06794706], dtype=float32)]
Start of epoch 1
[array([[ 0.08961578, -0.04327399, 0.06152926, 0.08325274],
[-0.03829437, -0.04908974, 0.02918325, -0.07456956],
[-0.01417133, 0.09609085, -0.05789544, 0.01366292],
[-0.00236284, 0.00199657, -0.02905108, 0.04717206],
[-0.05536905, -0.02011472, -0.06790011, -0.07329627]],
dtype=float32),
array([ 0.02154647, 0.05153348, -0.00128291, -0.06794706], dtype=float32)]
Start of epoch 2
[array([[ 0.09001605, -0.04358549, 0.06192534, 0.08316355],
[-0.03960795, -0.04806747, 0.02788337, -0.07427685],
[-0.01599812, 0.09751251, -0.05970317, 0.01406999],
[-0.00217021, 0.00184666, -0.02886046, 0.04712913],
[-0.05540781, -0.02008455, -0.06793848, -0.07328764]],
dtype=float32),
array([ 0.02154647, 0.05153348, -0.00128291, -0.06794706], dtype=float32)]
正如你所看到的,当权重w
更新时,偏置b
保持不变。
所以我正试图解决一个类似的问题。您需要做的是首先使用keras的功能性API。然后把所有可训练的权值放到一层把所有不可训练的权值放到另一层。将前一层输入到这两层中。然后你能做的就是使用tensorflow连接层将这些层重新组合在一起。假设你有一个包含5个神经元的隐藏层,其中3个是可训练的,另外2个是不可训练的。
X = Dense(5, activation='relu')(X) #previous layer
Y = Dense(3, activation='relu',name='trainable_layer')(X)
Z = Dense(2, activation='relu',name='non_trainable_layer')(X)
Z.trainable = False
X = Concatenate()([Y, Z])
X = Dense(5, activation='relu')(X) #layer after layer with mixed trainable weights