使用GradientTape计算预测相对于某些张量的梯度



我正在尝试在TensorFlow 2.0中使用GP实现WGAN。要计算梯度惩罚,需要计算预测相对于输入图像的梯度。

现在,为了使其更易于处理,它不是计算预测相对于所有输入图像的梯度,而是沿着原始和伪数据点的线计算插值数据点,并将其用作输入。

为了实现这一点,我首先开发了compute_gradients函数,该函数将进行一些预测,并返回这些预测相对于一些输入图像的梯度。首先,我想用tf.keras.backend.gradients做这件事,但它在急切模式下不起作用。因此,我现在正尝试使用GradientTape来实现这一点。

这是我用来测试的代码:

from tensorflow.keras import backend as K
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
import tensorflow as tf
import numpy as np
# Comes from Generative Deep Learning by David Foster
class RandomWeightedAverage(tf.keras.layers.Layer):
def __init__(self, batch_size):
super().__init__()
self.batch_size = batch_size
"""Provides a (random) weighted average between real and generated image samples"""
def call(self, inputs):
alpha = K.random_uniform((self.batch_size, 1, 1, 1))
return (alpha * inputs[0]) + ((1 - alpha) * inputs[1])
# Dummy critic
def make_critic():
critic = Sequential()
inputShape = (28, 28, 1)
critic.add(Conv2D(32, (5, 5), padding="same", strides=(2, 2),
input_shape=inputShape))
critic.add(LeakyReLU(alpha=0.2))
critic.add(Conv2D(64, (5, 5), padding="same", strides=(2, 2)))
critic.add(LeakyReLU(alpha=0.2))
critic.add(Flatten())
critic.add(Dense(512))
critic.add(LeakyReLU(alpha=0.2))
critic.add(Dropout(0.3))
critic.add(Dense(1))
return critic
# Gather dataset
((X_train, _), (X_test, _)) = tf.keras.datasets.fashion_mnist.load_data()
X_train = X_train.reshape(-1, 28, 28, 1)
X_test = X_test.reshape(-1, 28, 28, 1)
# Note that I am using test images as fake images for testing purposes
interpolated_img = RandomWeightedAverage(32)([X_train[0:32].astype("float"), X_test[32:64].astype("float")])
# Compute gradients of the predictions with respect to the interpolated images
critic = make_critic()
with tf.GradientTape() as tape:
y_pred = critic(interpolated_img)
gradients = tape.gradient(y_pred, interpolated_img)

梯度将变为None。我是不是遗漏了什么?

关于某些张量的预测梯度。。。我是不是遗漏了什么?

是。您需要tape.watch(interpolated_img):

with tf.GradientTape() as tape:
tape.watch(interpolated_img)
y_pred = critic(interpolated_img)

CCD_ 6需要存储前向通过的中间值来计算梯度。通常,您需要渐变WRT变量。因此,它不会保留从张量开始的计算痕迹,可能是为了节省内存。

如果你想要一个梯度WRT作为张量,你需要明确地告诉tape

相关内容

最新更新