Dst张量未初始化.即使是小批量

关于这一点，SO上有很多很多问题。所有问题的答案似乎都很直接，指出这几乎肯定是内存错误，减少批量应该有效。

就我而言，似乎发生了其他事情(或者我对这件事的工作方式有严重误解(。

我有一大堆刺激，比如：

train_x.shape # returns (2352, 131072, 2), amount 2.3k stimuli of size 131072x2
test_y.shape  # returns (2352,)

当然，我们可以想象这可能太过分了。事实上，创建一个简单的模型而不设置任何批量大小会在InternalError中返回。

model = Sequential([
Flatten(input_shape=(131072, 2)), 
Dense(128, activation=tf.nn.relu), 
Dense(50, activation=tf.nn.relu), 
Dense(1), 
])
model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
model.fit(train_x, train_y, epochs=5)

这将返回以下错误：

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

合乎逻辑的做法是缩小浴缸的尺寸。但是，设置1到2000之间的任何值都会返回相同的错误。这似乎意味着我没有足够的记忆来加载单一的刺激。然而

不仅仅是内存错误

如果我手动切割我的数据集，如下所示：

# Take first 20 stimuli
smaller_train_x = train_x[0:20,::] # shape is (20, 131072, 2)
smaller_trian_y = test_y[0:20]     # shape is (20, )

如果我试图将模型拟合到这个较小的数据集，它会起作用，不会返回错误。

model.fit(smaller_train_x, smaller_trian_y, epochs=5)

因此，设置单个刺激的batch_size，我得到了一个记忆错误。然而，在我的20个刺激数据集的手动剪切上运行效果很好。

简而言之，问题在于：

据我所知，

# Load in one stimuli at a time
model.fit(train_x, train_y, epochs=5, batch_size=1)

应该使用比少20倍的内存

# Load in 20 stimuli at a time
model.fit(smaller_train_x, smaller_trian_y, epochs=5)

那么第一个如何返回内存错误？

我在jupyer笔记本上运行这个程序，python版本3.8和tensorFlow版本2.10.0

基于以下实验，传递给model.fit(...)的列车样本的大小与batch_size的大小同样重要。

train_x: 峰值GPU内存随batch_size而增加，但不是线性的

model.fit(train_x, train_y, epochs=1, batch_size=10, callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 2.7969 gb, peak: 3.0 gb]
model.fit(train_x, train_y, epochs=1, batch_size=100, callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 2.7969 gb, peak: 3.0 gb]
model.fit(train_x, train_y, epochs=1, batch_size=1000, callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 2.7969 gb, peak: 4.0 gb]

smaller_train_x:对于相同的批处理大小，峰值GPU低于以前的情况

model.fit(smaller_train_x, smaller_trian_y, epochs=1, batch_size=10, callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 0.5 gb, peak: 0.6348 gb]

将train_x转换为tfrecords似乎是最佳的，GPU内存线性增加

dataset = dataset.batch(10)
model.fit(dataset, epochs=1,callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 0.5 gb, peak: 0.6348 gb]
dataset = dataset.batch(100)
model.fit(dataset, epochs=1,callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 0.5 gb, peak: 0.7228 gb]
dataset = dataset.batch(1000)
model.fit(dataset, epochs=1,callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 0.5 gb, peak: 1.6026 gb]

MemoryPrintingCallback():如何打印Keras'；s model.fit((

numpy-to-tfrecords:TFrecords的Numpy：有没有更简单的方法来处理来自TFrecords的批输入

不仅仅是内存错误

简而言之，问题在于：

相关内容

最新更新

热门标签：