从训练数组中移除错误的数组

  • 本文关键字:数组 错误 python numpy
  • 更新时间 :
  • 英文 :


我试图训练序列模型用于图像分类。在研究结果X_train、y_train数组之后,我发现X_train中的一些数组是空的->当尝试运行fit_generator时,我得到ValueError:设置具有序列的数组元素。

X_train形状为(2122,),y_train形状为(2122,28)。

我如何安全地删除从X_train和y_train数组知道索引空对象?

X_train由以下函数生成:

def normalize_image(image):
try:
return np.array(cv2.resize(image, (img_size, img_size))).astype("float32") / 255.0        
except Exception as e:
pass
X = np.array([normalize_image(img_data.get_pixels()) for img_data in imgs_data])

看起来像这样:

array([array([[[0.80784315, 0.84313726, 0.8784314 ],
[0.80784315, 0.84313726, 0.8745098 ],
[0.8039216 , 0.8509804 , 0.8666667 ],
...,
[0.77254903, 0.78431374, 0.8039216 ],
[0.7764706 , 0.7882353 , 0.80784315],
[0.78039217, 0.7921569 , 0.8117647 ]],
[[0.80784315, 0.84313726, 0.8784314 ],
[0.80784315, 0.84313726, 0.8745098 ],
[0.8039216 , 0.8509804 , 0.8666667 ],
...,
[0.77254903, 0.78431374, 0.8039216 ],
[0.76862746, 0.78039217, 0.8       ],
[0.77254903, 0.78431374, 0.8039216 ]],
[[0.80784315, 0.84313726, 0.8784314 ],
[0.80784315, 0.84313726, 0.8745098 ],
[0.8039216 , 0.8509804 , 0.8666667 ],
...,
[0.77254903, 0.78431374, 0.8039216 ],
[0.7764706 , 0.7882353 , 0.80784315],
[0.77254903, 0.7882353 , 0.8039216 ]],
...,
[[0.7921569 , 0.80784315, 0.8509804 ],
[0.79607844, 0.8117647 , 0.85490197],
[0.79607844, 0.8117647 , 0.85490197],
...,
[0.23529412, 0.39607844, 0.5254902 ],
[0.24313726, 0.39215687, 0.5294118 ],
[0.23921569, 0.3882353 , 0.5254902 ]],
[[0.7921569 , 0.80784315, 0.8509804 ],
[0.79607844, 0.8117647 , 0.85490197],
[0.79607844, 0.8117647 , 0.85490197],
...,
[0.23529412, 0.39607844, 0.5254902 ],
[0.24313726, 0.39215687, 0.5294118 ],
[0.23137255, 0.38039216, 0.5176471 ]],
[[0.7921569 , 0.80784315, 0.8509804 ],
[0.79607844, 0.8117647 , 0.85490197],
[0.79607844, 0.8117647 , 0.85490197],
...,
[0.22352941, 0.38431373, 0.5137255 ],
[0.24313726, 0.39215687, 0.5294118 ],
[0.24313726, 0.39215687, 0.5294118 ]]], dtype=float32),
...
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]], dtype=float32)], dtype=object)

只需做以下操作:

X = np.array([v for v in [normalize_image(img_data.get_pixels()) for img_data in imgs_data] if np.sum(v[0]) > 0])

这应该排除所有只有0的图像。

编辑:

我猜你仍然需要对y做同样的事情,所以你可以做下面的事情来同时对两个变量做:

pairs = [pair for pair in [(normalize_image(x.get_pixels()), y) for x, y in zip(imgs_data, Y)] if np.sum(pair[0]) > 0]

之后你可以创建X和Y:

X = np.array([x[0] for x in pairs])
Y = np.array([x[1] for x in pairs])