是否可以将 2 个显卡的内存加在一起以运行更大的神经网络?



如果我有一块 24 GB 内存的显卡,我可以添加第二张完全相同的显卡,将内存翻倍到 48 GB?

我想运行一个大型 3D-UNet,但由于我传递的卷的大小而停止。添加第二张卡可以让我做更大的音量吗?

更新:我运行在Linux(Red Hat Enterprise Linux 8(上。我的代码可以在两个 GPU 上进行训练。

**代码更新:

def get_model(optimizer, loss_metric, metrics, lr=1e-3):
inputs = Input((sample_width, sample_height, sample_depth, 1))
with tf.device('/device:gpu:0'): 
conv1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(inputs)
conv1 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(conv1)
pool1 = MaxPooling3D(pool_size=(2, 2, 2))(conv1)
drop1 = Dropout(0.5)(pool1)
conv2 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(drop1)
conv2 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv2)
pool2 = MaxPooling3D(pool_size=(2, 2, 2))(conv2)
drop2 = Dropout(0.5)(pool2)
conv3 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(drop2)
conv3 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(conv3)
pool3 = MaxPooling3D(pool_size=(2, 2, 2))(conv3)
drop3 = Dropout(0.3)(pool3)
conv4 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(drop3)
conv4 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(conv4)
pool4 = MaxPooling3D(pool_size=(2, 2, 2))(conv4)
drop4 = Dropout(0.3)(pool4)
conv5 = Conv3D(512, (3, 3, 3), activation='relu', padding='same')(drop4)
conv5 = Conv3D(512, (3, 3, 3), activation='relu', padding='same')(conv5)
with tf.device('/device:gpu:1'):
up6 = concatenate([Conv3DTranspose(256, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv5), conv4], axis=4)
conv6 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(up6)
conv6 = Conv3D(256, (3, 3, 3), activation='relu', padding='same')(conv6)
up7 = concatenate([Conv3DTranspose(128, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv6), conv3], axis=4)
conv7 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(up7)
conv7 = Conv3D(128, (3, 3, 3), activation='relu', padding='same')(conv7)
up8 = concatenate([Conv3DTranspose(64, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv7), conv2], axis=4)
conv8 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(up8)
conv8 = Conv3D(64, (3, 3, 3), activation='relu', padding='same')(conv8)
up9 = concatenate([Conv3DTranspose(32, (2, 2, 2), strides=(2, 2, 2), padding='same')(conv8), conv1], axis=4)
conv9 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(up9)
conv9 = Conv3D(32, (3, 3, 3), activation='relu', padding='same')(conv9)
conv10 = Conv3D(1, (1, 1, 1), activation='sigmoid')(conv9)
model = Model(inputs=[inputs], outputs=[conv10])    
model.compile(optimizer=optimizer(lr=lr), loss=loss_metric, metrics=metrics)    
return model

model = get_model(optimizer=Adam, loss_metric=dice_coef_loss, metrics=[dice_coef], lr=1e-3)
model_checkpoint = ModelCheckpoint('save.model', monitor=observe_var, save_best_only=False, period = 1000)
model.fit(train_x, train_y, batch_size = 1, epochs= 2000, verbose=1, shuffle=True, validation_split=0.2, callbacks=[model_checkpoint])
model.save('final_save.model')

我相信目前不可能组合多个 GPU 来创建具有组合内存的单个抽象 GPU。但是,您可以执行类似操作:将模型拆分到多个 GPU,这仍然具有能够运行大于任何单个 GPU 内存的模型的预期效果。

问题在于,这样做需要手动指定模型的哪些部分将在每个设备上运行,这可能很难有效地完成。我也不确定如何使用预制模型来完成它。

一般代码是这样的:

with tf.device('/gpu:0'):
# create half the model
with tf.device('/gpu:1'):
# create the other half of the model
# combine the two halves

更多阅读:

  • 是否可以在张量流中将网络拆分到多个 GPU 中?
  • TF.设备

有些人在两个RTX 3090上运行NeoX 20B型号。不知道三年前它是否不受支持,但现在它是 - 你在系统中看不到48GB,但Python/Tensorflow/Torch能够利用两者的VRAM。我将尝试将两张不同的卡放在一起 - 3090 和 3060,看看 StableLM/NeoX 微调是否会顺利进行。

https://youtu.be/bAY85Om5O6A

简短的回答是肯定的,但实际上它归结为您使用的软件代表您访问内存。 我对这些操作系统知之甚少,但我相信 Cuda 可能是一个开始寻找的地方

最新更新