Keras(Tensorflow)如何计算最后一层张量的形状?



我目前正在研究生成 Conv NN 的图像和生成循环 NN 的音频。我为两个生成器构建了模型,但由于某种原因,build_audio_generator模型的最后一层有一个张量(Tensor("model_4/sequential_4/activation_4/Tanh:0">(,形状为(?,1(而不是(?,28,28,1(。我的问题,我必须如何更改build_audio_generator的代码,使其具有与build_generator相同的形状(?,28,28,1(?

法典:

def build_generator(latent_dim, channels, num_classes):
model = Sequential()
model.add(Dense(128 * 7 * 7, activation="relu", input_dim=latent_dim))
model.add(Reshape((7, 7, 128)))
model.add(BatchNormalization(momentum=0.8))
model.add(UpSampling2D())
model.add(Conv2D(128, kernel_size=3, padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(momentum=0.8))
model.add(UpSampling2D())
model.add(Conv2D(64, kernel_size=3, padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(momentum=0.8))
model.add(Conv2D(channels, kernel_size=3, padding='same'))
model.add(Activation("tanh"))
model.summary()
noise = Input(shape=(latent_dim,))
label = Input(shape=(1,), dtype='int32')

label_embedding = Flatten()(Embedding(num_classes, 100)(label))
model_input = multiply([noise, label_embedding])
img = model(model_input)
return Model([noise, label], img)
def build_audio_generator(latent_dim, num_classes):
model = Sequential()
model.add(LSTM(512, input_dim=latent_dim, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(512))
model.add(Dense(256))
model.add(Dropout(0.3))
model.add(Dense(num_classes))
model.add(Activation('tanh'))
model.summary()
noise = Input(shape=(None, latent_dim,))
label = Input(shape=(1,), dtype='int32')
label_embedding = Flatten()(Embedding(num_classes, 100)(label))
model_input = multiply([noise, label_embedding])
sound = model(model_input)
return Model([noise, label], sound)
# Build the generator
generator = build_generator(100, 3, 1)
audio_generator = build_audio_generator(100, 1)
# The generator takes noise and the target label as input
# and generates the corresponding digit of that label
noise = Input(shape=(None, 100,))
label = Input(shape=(1,))
img = generator([noise, label])
audio = audio_generator([noise, label])
print('Audio: '+ str(audio))
print('Audio shape: ' + str(audio.shape))
print('IMG: '+str(img))
print('IMG shape: ' + str(img.shape))

控制台输出:

Audio: Tensor("model_4/sequential_4/activation_4/Tanh:0", shape=(?, 1), dtype=float32)
Audio shape: (?, 1)
IMG: Tensor("model_3/sequential_3/activation_3/Tanh:0", shape=(?, 28, 28, 1), dtype=float32)
IMG shape: (?, 28, 28, 1)

我想你会想要3D的音频,不是吗?

只需在所有 LSTM 中保留return_sequences=True即可。

最新更新