从Torchvision Model中的DICOM文件上传数据



如果问题太基本,我很抱歉,但我刚刚开始使用PyTorch(和Python)。

我试图一步一步地遵循这里的说明:https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

然而,我正在处理一些DICOM文件,我保存在两个目录(CANCER/NOCANCER)中。我用拆分文件夹拆分它们,使其结构化,以便与ImageFolder数据集一起使用(如教程中所示)。

我知道我只需要加载从DICOM文件中提取的pixel_arrays,并且我编写了一些辅助函数:
  1. 读取.dcm文件的所有路径;
  2. 读取它们并提取pixel_array;
  3. 做一点预处理。以下是辅助函数的概要:
import os
import pydicom
import cv2
import numpy as np 


def createListFiles(dirName):
print("Fetching all the files in the data directory...")
lstFilesDCM =[]
for root, dir, fileList in os.walk(dirName):
for filename in fileList:
if ".dcm" in filename.lower():
lstFilesDCM.append(os.path.join( root , filename))
return lstFilesDCM

def castHeight(list):
lstHeight = []
min_height = 0        
for filenameDCM in list:
readfile = pydicom.read_file(filenameDCM)
lstHeight.append(readfile.pixel_array.shape[0])
min_height = np.min(lstHeight)   
return  min_height


def castWidth(list):
lstWidth = []
min_Width = 0
for filenameDCM in list:
readfile = pydicom.read_file(filenameDCM)
lstWidth.append(readfile.pixel_array.shape[1])
min_Width = np.min(lstWidth)   
return  min_Width


def Preproc1(listDCM):
new_height, new_width = castHeight(listDCM), castWidth(listDCM)
ConstPixelDims = (len(listDCM), int(new_height), int(new_width)) 

ArrayDCM = np.zeros(ConstPixelDims, dtype=np.float32)

## loop through all the DICOM files
for filenameDCM in listDCM:    
## read the file
ds = pydicom.read_file(filenameDCM)

mx0 = ds.pixel_array

## Standardisation 
imgb = mx0.astype('float32')
imgb_stand = (imgb - imgb.mean(axis=(0, 1), keepdims=True)) / imgb.std(axis=(0, 1), keepdims=True)

## Normalisation 
imgb_norm = cv2.normalize(imgb_stand, None, 0, 1, cv2.NORM_MINMAX)        

## we make sure that data is saved as a data_array as a numpy array
data = np.array(imgb_norm)


## we save it into ArrayDicom and resize it based 'ConstPixelDims' 
ArrayDCM[listDCM.index(filenameDCM), :, :] =  cv2.resize(data, (int(new_width), int(new_height)), interpolation = cv2.INTER_CUBIC)

return ArrayDCM

那么,现在,我如何告诉数据加载程序加载数据,考虑到它的结构是为了标记的目的,但只有在做这个提取和预处理之后?我引用了"加载数据"文档中教程的一部分,它是:

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}

如果有意义的话,是否可以在

的行上做一些事情?
image_datasets = {x: datasets.ImageFolder(Preproc1(os.path.join(data_dir, x)), data_transforms[x]) for x in ['train', 'val']}

?

另外,我的另一个问题是:当教程建议进行转换时,是否值得在我的预处理中进行规范化步骤。正常化?

我真的很抱歉这听起来很模糊,我已经试着解决这个问题几个星期了,但我做不到。

听起来您最好实现自己的自定义Dataset。事实上,我认为在为模型读取图像之前,将规范化和其他东西推迟到应用的转换会更好。

最新更新