为什么我的多类分割使用tensorflow gpu，而keras在训练期间只使用2%的gpu

我正试图在单个gpu和多个gpu上使用tensorflow gpu作为Keras的后端来运行我的多类图像分割问题。我发现训练跑得非常慢。当我查看利用率时，我可以看到GPU几乎没有被使用，大约为2%。我有大约10000个图像和掩码，每个都是(224x224x3(，我将掩码转换为分类友好的一个热编码结构，这样我就有了四个类和形状为(224x224 x4(的掩码。我使用的是标准的unet编码器、解码器架构。使用序列类，我编写了自己的自定义生成器，它可以获取图像和掩码并进行预处理。我想知道我的训练是否很慢，因为我的自定义生成器是这个过程中的某种瓶颈？我是不是在生成器本身做了太多的预处理(即调整图像大小等(？我不知道如何解释为什么训练需要这么长时间。下面我包含了三个脚本1。unet模型2。自定义生成器和3。编译模型、训练模型并调用生成器的分段脚本。如果能为我们解释为什么会发生这种情况提供任何帮助，我们将不胜感激。

我也相信我正确地使用了tensorflow gpu和可用的gpu，因为我收到了以下消息

GPU Prolog Script v0.30
This is a GPU node.
Enough GPUs available.
Allocating card 1
2020-03-05 10:40:05.996313: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-03-05 10:40:06.078021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:03:00.0
2020-03-05 10:40:06.127190: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-05 10:40:06.221801: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-03-05 10:40:06.296413: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-03-05 10:40:06.379031: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-03-05 10:40:06.429316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-03-05 10:40:06.485672: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-03-05 10:40:06.791850: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-05 10:40:06.796626: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-05 10:40:06.797199: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-05 10:40:06.813236: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400010000 Hz
2020-03-05 10:40:06.815750: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x535a0a0 executing computations on platform Host. Devices:
2020-03-05 10:40:06.815778: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2020-03-05 10:40:07.000335: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x53bd360 executing computations on platform CUDA. Devices:
2020-03-05 10:40:07.000385: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-03-05 10:40:07.002638: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:03:00.0
2020-03-05 10:40:07.002714: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-05 10:40:07.002747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-03-05 10:40:07.002774: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-03-05 10:40:07.002802: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-03-05 10:40:07.002829: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-03-05 10:40:07.002856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-03-05 10:40:07.002884: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-05 10:40:07.010122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-05 10:40:07.023584: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-03-05 10:40:07.026875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-05 10:40:07.026902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-03-05 10:40:07.026919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-03-05 10:40:07.034045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10481 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
2020-03-05 10:54:36.697783: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-05 10:54:39.743744: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0

import tensorflow as tf
from tensorflow import keras
import numpy as np
class Unet():
def __init__(self, imgDims, nOutput=1, finalActivation='sigmoid', activation='relu', padding='same'):
self.imgDims = imgDims
self.activation = activation
self.finalActivation = finalActivation
self.padding = padding
self.nOutput = nOutput
def convBlocks(self, x, filters, kernelSize=(3,3), padding='same', strides=1):
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Activation(self.activation)(x)
x = keras.layers.Conv2D(filters, kernelSize, padding=padding, strides=strides)(x)
return x

def identity(self, x, xInput, f, padding='same', strides=1):
skip = keras.layers.Conv2D(f, kernel_size=(1, 1), padding=padding, strides=strides)(xInput)
skip = keras.layers.BatchNormalization()(skip)
output = keras.layers.Add()([skip, x])
return output

def residualBlock(self, xIn, f, stride):
res = self.convBlocks(xIn, f, strides=stride)
res = self.convBlocks(res, f, strides=1)
output = self.identity(res, xIn, f, strides=stride)
return output

def upSampling(self, x, xInput):
x = keras.layers.UpSampling2D((2,2))(x)
x = keras.layers.Concatenate()([x, xInput])
return x

def encoder(self, x, filters, kernelSize=(3,3), padding='same', strides=1):
e1 = keras.layers.Conv2D(filters[0], kernelSize, padding=padding, strides=strides)(x)
e1 = self.convBlocks(e1, filters[0])
shortcut = keras.layers.Conv2D(filters[0], kernel_size=(1, 1), padding=padding, strides=strides)(x)
shortcut = keras.layers.BatchNormalization()(shortcut)
e1Output = keras.layers.Add()([e1, shortcut])
e2 = self.residualBlock(e1Output, filters[1], stride=2)
e3 = self.residualBlock(e2, filters[2], stride=2)
e4 = self.residualBlock(e3, filters[3], stride=2)
e5 = self.residualBlock(e4, filters[4], stride=2)
return e1Output, e2, e3, e4, e5

def bridge(self, x, filters):
b1 = self.convBlocks(x, filters, strides=1)
b2 = self.convBlocks(b1, filters, strides=1)
return b2

def decoder(self, b2, e1, e2, e3, e4, filters, kernelSize=(3,3), padding='same', strides=1):
x = self.upSampling(b2, e4)
d1 = self.convBlocks(x, filters[4])
d1 = self.convBlocks(d1, filters[4])
d1 = self.identity(d1, x, filters[4])
x = self.upSampling(d1, e3)
d2 = self.convBlocks(x, filters[3])
d2 = self.convBlocks(d2, filters[3])
d2 = self.identity(d2, x, filters[3])
x = self.upSampling(d2, e2)
d3 = self.convBlocks(x, filters[2])
d3 = self.convBlocks(d3, filters[2])
d3 = self.identity(d3, x, filters[2])
x = self.upSampling(d3, e1)
d4 = self.convBlocks(x, filters[1])
d4 = self.convBlocks(d4, filters[1])
d4 = self.identity(d4, x, filters[1])
return d4

def ResUnet(self, filters = [16, 32, 64, 128, 256]):
inputs = keras.layers.Input((self.imgDims, self.imgDims, 3))
e1, e2, e3, e4, e5 = self.encoder(inputs, filters)
b2 = self.bridge(e5, filters[4])
d4 = self.decoder(b2, e1, e2, e3, e4, filters)
x = keras.layers.Conv2D(self.nOutput, (1, 1), padding='same', activation=self.finalActivation)(d4)
model = keras.models.Model(inputs, x)
return model

import cv2
import os
import numpy as np
from tensorflow import keras
from skimage import img_as_bool
from skimage.transform import resize

class DataGenerator(keras.utils.Sequence):
def __init__(self, imgIds, maskIds, imagePath, maskPath, weights=[1,1,1,1],
batchSize=16, imageSize = (224, 224, 3), nClasses=4, shuffle=False):
self.imgIds = imgIds
self.maskIds = maskIds
self.imagePath = imagePath
self.maskPath = maskPath
self.weights = np.array(weights)
self.batchSize = batchSize
self.imageSize = imageSize
self.nClasses = nClasses
self.shuffle = shuffle


'''
for each image id load the patch and corresponding mask
'''
def __load__(self, imgName, maskName):
img = cv2.imread(os.path.join(self.imagePath,imgName))
img = cv2.resize(img, (self.imageSize[0], self.imageSize[1]))
img = img/255.0
mask = cv2.imread(os.path.join(self.maskPath,maskName))
mask = img_as_bool(resize(mask, (self.imageSize[0], self.imageSize[1])))
mask = np.dstack((mask, np.zeros((224, 224))))
mask = mask.astype('uint16')
mask[:,:,3][mask[:,:,0]==0]=1
mask = self.weightMasks(mask)
return (img, mask)

'''
get the files for each batch (override __getitem__ method)
'''
def __getitem__(self, index):
if(index+1)*self.batchSize > len(self.imgIds):
self.batchSize = len(self.imgIds) - index*self.batchSize
batchImgs = self.imgIds[self.batchSize*index:self.batchSize*(index+1)]
batchMasks = self.maskIds[self.batchSize*index:self.batchSize*(index+1)]
batchfiles = [self.__load__(imgFile, maskFile) for imgFile, maskFile in zip(batchImgs, batchMasks)]
images, masks = zip(*batchfiles)
return np.array(list(images)), np.array(list(masks))

'''
Return number of steps per batch that are needed (override __len__ method)
'''
def __len__(self):
return int(np.ceil(len(self.imgIds)/self.batchSize))

import os
import csv
import cv2
import glob
import numpy as np
import pickle
import random
import argparse
import json
import tensorflow as tf
from sklearn.utils import class_weight
from tensorflow import keras
from skimage.transform import resize
from skimage import img_as_bool
from tensorflow.keras import backend as K
from scripts.resunet_multi import Unet
from scripts.fcn8 import FCN
from scripts.utilities import saveModel, saveHistory
from scripts.evaluation import dice_coef_loss, dice_coef
from scripts.custom_datagenerator_three import DataGenerator
from scripts.custom_loss_functions import weightedCatXEntropy

def getPrediction(model, validGenerator, validIds):
steps = len(validIds)//validGenerator.batchSize
for i in range(0, steps):
x, y = validGenerator.__getitem__(i)
y[y==1]=255
masks.append(y)
yPred = model.predict(x)
yPred = np.argmax(yPred, axis=3)
for img in yPred:
x, y = validGenerator.__getitem__(i)
y[y==1]=255
masks.append(y)
yPred = model.predict(x)
yPred = np.argmax(yPred, axis=3)

def trainSegmentationModel(args):
basePath = args['basepath']
imageDir = args['imagedir']
maskDir = args['maskdir']
if args['weightfile'] is not None:
with open(args['weightfile'], 'r') as txtFile:
weights = list(csv.reader(txtFile, delimiter=','))
with open(args['paramfile']) as jsonFile:
params = json.load(jsonFile)
print(params['nClasses'])
if args['model'] == 'unet':
unet =  Unet(int(params['imageDims']), nOutput = int(params['nClasses']), finalActivation=params['final'])
model = unet.ResUnet()
elif args['model'] == 'fcn8':
fcn = FCN(int(params['imageDims']), nClasses = int(params['nClasses']), finalActivation=params['final'])
model = fcn.getFCN8()
epoch = int(params['epoch'])
ratio = float(params['ratio'])
imagePath = os.path.join(basePath, imageDir)
maskPath = os.path.join(basePath, maskDir)
imgIds = glob.glob(os.path.join(imagePath, '*'))
imgIds = [os.path.basename(f) for f in imgIds][:200]
maskIds = glob.glob(os.path.join(maskPath, '*'))
maskIds = [os.path.basename(f) for f in maskIds][:200]
trainNum = round(ratio*len(imgIds))
validNum = np.floor((len(imgIds) - trainNum))
trainIds = imgIds[:trainNum]
validIds = imgIds[trainNum:]
#testIds = imgIds[(trainNum+validNum):]
trainMasks = maskIds[:trainNum]
validMasks = maskIds[trainNum:]
#testMasks = maskIds[(trainNum+validNum):]
trainGenerator = DataGenerator(trainIds, trainMasks, imagePath, maskPath)
validGenerator = DataGenerator(validIds, validMasks, imagePath, maskPath)
#testGenerator = DataGenerator(testIds, validMasks, imagePath, maskPath)
trainSteps = len(trainIds)//trainGenerator.batchSize
validSteps = len(validIds)//validGenerator.batchSize
if args['weightfile'] is None:
for i in range(trainSteps):
_, m = trainGenerator.__getitem__(i)
mask = np.argmax(m, axis=3)
labels.append(mask.reshape(-1))
labels = [l.tolist() for l in labels]
labels = itertools.chain(*labels)
weights = class_weight.compute_class_weight('balanced', np.unique(labels), labels)
#learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False
adam = keras.optimizers.Adam()
model.compile(optimizer=adam, loss=weightedCatXEntropy, metrics=[dice_coef])
trainSteps = len(trainIds)//trainGenerator.batchSize
validSteps = len(validIds)//validGenerator.batchSize
history = model.fit_generator(trainGenerator,
validation_data=validGenerator,
steps_per_epoch=trainSteps,
validation_steps=validSteps,
verbose=1,
epochs=epoch)
saveModel(model, args['name'])
saveHistory(history, args['name']+'_hist')
#getPrediction(model, validGenerator, validIds)
if __name__ == '__main__':
ap = argparse.ArgumentParser()
ap.add_argument('-bp', '--basepath', required=True, help='path to image and mask directories')
ap.add_argument('-ip', '--imagedir', required=True, help='path to image directory')
ap.add_argument('-mp', '--maskdir', required=True, help='path to image directory')
ap.add_argument('-m', '--model', required=True, help='neural network model to use')
ap.add_argument('-n', '--name', required=True, help='name to save the model with')
ap.add_argument('-wf', '--weightfile', help='file containing list of class weights for unbalanced datasets')
ap.add_argument('-pf', '--paramfile', help='file containing parameters')
args = vars(ap.parse_args())
trainSegmentationModel(args)

您可以尝试在训练中运行评测。这里有一个很好的教程：https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras请注意，在某些情况下，遵循和理解它不是很容易，但它也可能非常有用。

还有一个提示：考虑到你用几个操作来处理图像和掩码，我会认真考虑对整个训练集和验证集进行预处理，这样在你的生成器中，你只需要从文件中读取它们，就可以了。这样，很有可能在每个epoch的训练(和验证(时间节省关键时间。

希望它能有所帮助！

相关内容

最新更新

热门标签：