为什么在TensorFlow中获取模型参数的值和重新分配新值需要越来越长的时间



我有一个Python函数,它接受TensorFlow会话、符号变量(表示模型参数的张量、模型参数的梯度(。我在循环中调用这个函数,随后的每次调用都需要越来越长的时间。所以,我想知道这可能是什么原因。

这是函数的代码:

def minimize_step(s, params, grads, min_lr, factor, feed_dict, score):
'''
Inputs:
s - TensorFlow session
params - list of nodes representing model parameters
grads - list of nodes representing gradients of parameters
min_lr - startning learnig rate
factor - growth factor for the learning rate
feed_dict - feed dictionary used to evaluate gradients and score
Normally it contains X and Y
score - score that is minimized
Result:
One call of this function makes an update of model parameters.
'''
ini_vals = [s.run(param) for param in params]
grad_vals = [s.run(grad, feed_dict = feed_dict) for grad in grads]
lr = min_lr
best_score = None
while True:
new_vals = [ini_val - lr * grad for ini_val, grad in zip(ini_vals, grad_vals)]
for i in range(len(new_vals)):
s.run(tf.assign(params[i], new_vals[i]))
score_val = s.run(score, feed_dict = feed_dict)
if best_score == None or score_val < best_score:
best_score = score_val
best_lr = lr
best_params = new_vals[:]
else:
for i in range(len(new_vals)):
s.run(tf.assign(params[i], best_params[i]))
break
lr *= factor
return best_score, best_lr

会不会是代表模型参数的符号变量以某种方式积累了旧的值?

您似乎忽略了tensorflow 1.*的使用方法。我不在这里详述,因为你可以在互联网上找到很多资源。我认为这篇论文足以理解如何使用tensorflow 1.*.的概念

在您的示例中,在每次迭代时,您都在不断地向图中添加新节点。

假设这是你的执行图

import tensorflow as tf
import numpy as np
x = tf.placeholder(tf.float32, (None, 2))
y = tf.placeholder(tf.int32, (None))
res = tf.keras.layers.Dense(2)(x)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=res, labels=y)
loss_tensor = tf.reduce_mean(xentropy)
lr = tf.placeholder(tf.float32, ())
grads = tf.gradients(loss_tensor, tf.trainable_variables())
weight_updates = [tf.assign(w, w - lr * g) for g, w in zip(grads, tf.trainable_variables())]

每次执行weight_updates时,都会更新模型的权重。

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# before
print(sess.run(tf.trainable_variables()))
#     [array([[ 0.7586721 , -0.7465675 ],
#             [-0.34097505, -0.83986187]], dtype=float32), array([0., 0.], dtype=float32)]
# after
evaluated = sess.run(weight_updates,
{x: np.random.normal(0, 1, (2, 2)),
y: np.random.randint(0, 2, 2),
lr: 0.001})
print(evaluated)
#     [array([[-1.0437444 , -0.7132262 ],
#             [-0.8282471 , -0.01127395]], dtype=float32), array([ 0.00072743, -0.00072743], dtype=float32)]

在您的示例中,在每个步骤中,您都将向图中添加额外的执行流,而不是使用现有的执行流。

最新更新