读取 dicom 存储库时出现编码问题



我正在预处理DICOM图像存储库以向其馈送卷积神经网络,但是当我尝试读取存储库时,它抛出以下错误:

查找错误:未知编码:ISO 2022 IR 100

这是我使用的代码:

listoflists = []
list = []
for x in range(1, 10):
data_path = "/home/lorenzo_f/CT COLONOGRAPHY/1.3.6.1.4.1.9328.50.4.000%d" %x 
output_path ="/home/lorenzo_f/output/"
subfolders = [f.path for f in os.scandir(data_path) if f.is_dir() ]    
subfolder = [f.path for f in os.scandir(subfolders[0]) if f.is_dir() ]  
list.append(load_scan(subfolder[0]))
list.append(load_scan(subfolder[1]))
listoflists.append((list))

使用函数load_scan

# Loop over the image files and store everything into a list.
def load_scan(path):
slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]
#slices[0].SpecificCharacterSet = 'latin_1'
slices.sort(key = lambda x: int(x.InstanceNumber))
try:
slice_thickness = np.abs(slices[0].ImagePositionPatient[2] - slices[1].ImagePositionPatient[2])
except:
slice_thickness = np.abs(slices[0].SliceLocation - slices[1].SliceLocation)
for s in slices:
s.SliceThickness = slice_thickness
return slices

我找不到你说的标签,这是我使用的代码和输出的第一个条目

data_path = "/home/lorenzo_f/CT COLONOGRAPHY/1.3.6.1.4.1.9328.50.4.0010"
output_path ="/home/lorenzo_f/output/"
subfolders = [f.path for f in os.scandir(data_path) if f.is_dir() ]    
subfolder = [f.path for f in os.scandir(subfolders[0]) if f.is_dir() ]
ds=load_scan(subfolder[0])
ds
(0008, 0008) Image Type                          CS: ['ORIGINAL', 'SECONDARY', 'AXIAL']
(0008, 0016) SOP Class UID                       UI: CT Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.3.6.1.4.1.9328.50.4.9867
(0008, 0020) Study Date                          DA: '20000101'
(0008, 0021) Series Date                         DA: '20000101'
(0008, 0022) Acquisition Date                    DA: '20000101'
(0008, 0023) Content Date                        DA: '20000101'
(0008, 0030) Study Time                          TM: '091936'
(0008, 0032) Acquisition Time                    TM: '092131'
(0008, 0033) Content Time                        TM: '101416'
(0008, 0050) Accession Number                    SH: ''
(0008, 0060) Modality                            CS: 'CT'
(0008, 0070) Manufacturer                        LO: 'GE MEDICAL SYSTEMS'
(0008, 0080) Institution Name                    LO: ''
(0008, 0081) Institution Address                 ST: ''
(0008, 0090) Referring Physician's Name          PN: 'xDONEx'
(0008, 1030) Study Description                   LO: 'CT COLONOGRAP C'
(0008, 103e) Series Description                  LO: 'CT COLONOGRAPHY'
(0008, 1048) Physician(s) of Record              PN: ' '
(0008, 1090) Manufacturer's Model Name           LO: 'LightSpeed16'
(0008, 1140)  Referenced Image Sequence   0 item(s) ---- 
(0008, 2112)  Source Image Sequence   0 item(s) ---- 
(0010, 0010) Patient's Name                      PN: '1.3.6.1.4.1.9328.50.4.0010'
(0010, 0020) Patient ID                          LO: '1.3.6.1.4.1.9328.50.4.0010'
(0010, 0030) Patient's Birth Date                DA: ''
(0010, 0040) Patient's Sex                       CS: 'M'
(0010, 1000) Other Patient IDs                   LO: ''
(0010, 1010) Patient's Age                       AS: '068Y'
(0010, 21b0) Additional Patient History          LT: 'COLON SCREENING'
(0010, 21c0) Pregnancy Status                    US: []
(0012, 0010) Clinical Trial Sponsor Name         LO: ''
(0012, 0020) Clinical Trial Protocol ID          LO: ''
(0012, 0021) Clinical Trial Protocol Name        LO: ''
(0012, 0030) Clinical Trial Site ID              LO: ''
(0012, 0031) Clinical Trial Site Name            LO: ''
(0012, 0040) Clinical Trial Subject ID           LO: ''
(0012, 0042) Clinical Trial Subject Reading ID   LO: ''
(0013, 0010) Private Creator                     LO: 'CTP'
(0013, 1010) Private tag data                    UN: b'CT COLONOGRAPHYx00'
(0013, 1013) Private tag data                    UN: b'70093008'
(0018, 0015) Body Part Examined                  CS: 'COLON'
(0018, 0022) Scan Options                        CS: 'HELICAL MODE'
(0018, 0050) Slice Thickness                     DS: '0.7999999999999989'
(0018, 0060) KVP                                 DS: '120'
(0018, 0090) Data Collection Diameter            DS: '500.000000'
(0018, 1020) Software Version(s)                 LO: 'LightSpeedverrel'
(0018, 1030) Protocol Name                       LO: '6.10 CT  COLONOGRAPHY'
(0018, 1100) Reconstruction Diameter             DS: '330.000000'
(0018, 1110) Distance Source to Detector         DS: '949.075012'
(0018, 1111) Distance Source to Patient          DS: '541.000000'
(0018, 1120) Gantry/Detector Tilt                DS: '0.000000'
(0018, 1130) Table Height                        DS: '167.199997'
(0018, 1140) Rotation Direction                  CS: 'CW'
(0018, 1150) Exposure Time                       IS: '526'
(0018, 1151) X-Ray Tube Current                  IS: '140'
(0018, 1152) Exposure                            IS: '2286'
(0018, 1160) Filter Type                         SH: 'BODY FILTER'
(0018, 1170) Generator Power                     IS: '16800'
(0018, 1190) Focal Spot(s)                       DS: '0.700000'
(0018, 1200) Date of Last Calibration            DA: ''
(0018, 1201) Time of Last Calibration            TM: ''
(0018, 1210) Convolution Kernel                  SH: 'STANDARD'
(0018, 5100) Patient Position                    CS: 'FFS'
(0020, 000d) Study Instance UID                  UI: 1.3.6.1.4.1.9328.50.4.9864
(0020, 000e) Series Instance UID                 UI: 1.3.6.1.4.1.9328.50.4.9865
(0020, 0010) Study ID                            SH: '1'
(0020, 0011) Series Number                       IS: '102'
(0020, 0012) Acquisition Number                  IS: '1'
(0020, 0013) Instance Number                     IS: '1'
(0020, 0032) Image Position (Patient)            DS: ['-165.000000', '-165.000000', '-8.335000']
(0020, 0037) Image Orientation (Patient)         DS: ['1.000000', '0.000000', '0.000000', '0.000000', '1.000000', '0.000000']
(0020, 0052) Frame of Reference UID              UI: 1.3.6.1.4.1.9328.50.4.9866
(0020, 1040) Position Reference Indicator        LO: 'XY'
(0020, 1041) Slice Location                      DS: '-8.335000'
(0028, 0002) Samples per Pixel                   US: 1
(0028, 0004) Photometric Interpretation          CS: 'MONOCHROME2'
(0028, 0010) Rows                                US: 512
(0028, 0011) Columns                             US: 512
(0028, 0030) Pixel Spacing                       DS: ['0.644531', '0.644531']
(0028, 0100) Bits Allocated                      US: 16
(0028, 0101) Bits Stored                         US: 16
(0028, 0102) High Bit                            US: 15
(0028, 0103) Pixel Representation                US: 1
(0028, 0120) Pixel Padding Value                 SS: -2000
(0028, 1050) Window Center                       DS: '40'
(0028, 1051) Window Width                        DS: '400'
(0028, 1052) Rescale Intercept                   DS: '-1024'
(0028, 1053) Rescale Slope                       DS: '1'
(0040, a124) UID                                 UI: ''
(0088, 0140) Storage Media File-set UID          UI: ''
(3006, 0024) Referenced Frame of Reference UID   UI: ''
(3006, 00c2) Related Frame of Reference UID      UI: ''
(7fe0, 0010) Pixel Data                          OW: Array of 524288 bytes,```

我认为您的问题与传输语法无关。错误消息表明特定字符集 (0008,0005) 的值为

ISO 2022 IR 100

ISO 2022 仅在使用所谓的代码扩展技术时才允许。这意味着,相同的属性值可以包含从不同字符集获得的字符,并使用特殊的字节序列(在 ISO 2022 中定义)在它们之间切换。

有关参考,请参阅 PS3.3、C.12.1.1.2

代码扩展技术相对难以处理,因此很少使用。事实上,这是我(可能)看到这样一个对象的第一个案例。这对我来说也很有趣 - 所以你介意分享创建此图像的制造商和设备吗?

如何解决这个问题?问得好。我不知道有任何(python)工具包能够处理这种字符串编码 - 也许dcm4che可以做到这一点。

如果您只想提取像素数据,那么可能值得尝试将 (0008,0005) 的值更改为"ISO_IR 100"。这可能会导致读取元数据(如患者姓名或研究描述)时出现问题。但是与像素数据编码相关的所有属性都不受字符编码的影响,因此它应该可以工作。

最新更新