PyTorch在训练时崩溃:可能的图像解码错误、张量值、损坏的图像.(运行时错误)



前提

我对使用PyTorch相当陌生,在用一个小的自定义数据集(90种分类的10张图像)训练我的神经网络时,我经常会出现segfault。

下面的输出来自这些运行了两次的打印语句(MNIST数据集位于idx 0,我的自定义数据集位于id x 0)。两个数据集都是使用格式完全相同的csv文件(img_name,class)和图像目录编译的。MNIST子集大小为30,我的自定义数据集大小为10:

example, label = dataset[0]
print(dataset[0])
print(example.shape)
print(label)

第一个张量是MNIST 28X28 png,使用转换为张量

image = torchvision.io.read_image().type(torch.FloatTensor)

这就是为什么我有一个工作数据集可以比较。它使用了与我的自定义数据相同的自定义数据集类。

Neural Net类与我的自定义数据NN完全相同,不同之处在于它有10个输出,而不是我自定义数据的90个输出。

自定义数据的大小各不相同,所有数据都已使用转换调整为28 X 28。下面列出了Compose()。在这10个图像的数据子集中,有尺寸为800X170、96X66、64X34、208X66等的图像。

第二个张量输出来自大小为800X170的png。

在两个数据集上执行的转换完全相同:

tf=transforms.Compose([
transforms.Resize(size = (28,28)),
transforms.Normalize(mean=[-0.5/0.5],std=[1/0.5])
])

没有执行目标变换。

张量的输出、张量大小、类别以及结束时执行的训练/测试

(tensor([[[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  19.5000,
119.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
127.0000,  93.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.5000,
127.5000, 107.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  63.5000,
127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  59.0000,
127.0000,  58.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
127.0000,  66.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
127.0000, 106.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  33.0000,
128.0000, 107.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  32.5000,
127.0000,  88.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  59.5000,
127.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
127.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
127.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.5000,
128.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
127.0000,  54.0000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
127.0000,  60.0000,   8.0000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  85.0000,
127.0000, 127.5000,  84.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,  28.0000,
118.5000,  65.5000,  14.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000],
[  0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,
0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000,   0.5000]]]), 1)
torch.Size([1, 28, 28])
1
Train Epoch: 1 [0/25 (0%)]  Loss: -1.234500
Test set: Average loss: -1.6776, Accuracy: 1/5 (20%)
(tensor([[[68.1301, 67.3571, 68.4286, 67.9375, 69.5536, 69.2143, 69.0026,
69.2283, 70.4464, 70.2857, 68.8839, 68.6071, 71.3214, 70.5102,
71.0753, 71.9107, 71.5179, 71.5625, 73.6071, 71.9464, 73.2513,
72.5804, 73.5000, 74.1429, 72.7768, 72.9107, 73.1786, 74.9069],
[68.2028, 70.0714, 68.4821, 69.3661, 70.8750, 69.6607, 70.6569,
70.2551, 70.9464, 70.3393, 70.3929, 71.3571, 71.1250, 72.1901,
70.6850, 71.9464, 72.1071, 72.8304, 72.3036, 72.3214, 73.4528,
73.4898, 72.4286, 73.0179, 73.1071, 73.5179, 73.0357, 74.0280],
[71.3457, 70.4643, 70.4464, 70.7857, 70.6071, 71.9821, 71.6786,
72.7564, 72.4107, 72.2321, 72.8571, 72.7321, 70.0357, 72.2640,
73.8214, 72.8750, 73.0000, 73.0089, 74.8393, 74.1964, 74.9872,
73.4248, 72.0179, 74.5357, 74.9018, 74.9821, 75.0357, 72.9286],
[70.1429, 70.3750, 69.8750, 70.6250, 69.8750, 72.8750, 71.4107,
71.5089, 73.3750, 73.2500, 74.4375, 73.8750, 73.0000, 74.4375,
72.2768, 72.7500, 72.6250, 72.6250, 73.1250, 73.2500, 72.3571,
73.0625, 72.5000, 74.8750, 73.6875, 74.2500, 75.2500, 73.7411],
[53.1428, 56.1607, 57.4286, 58.3393, 60.6607, 59.3393, 62.2589,
62.8380, 64.1250, 66.6429, 66.9821, 67.8750, 74.7679, 70.5192,
68.7411, 69.3036, 66.0001, 67.9733, 67.4822, 68.3393, 68.3534,
69.5740, 69.4465, 70.9465, 69.0983, 72.2679, 70.4286, 70.1493],
[61.2143, 63.0000, 69.0357, 65.3393, 62.3214, 59.8036, 56.2730,
54.5829, 52.8393, 52.8929, 50.8304, 52.9107, 66.4643, 69.6875,
71.1849, 72.2678, 73.9821, 74.4643, 73.0357, 74.1250, 75.6492,
76.2360, 75.7679, 75.6071, 75.2857, 74.9286, 74.8929, 75.1850],
[54.9439, 62.5357, 69.7143, 72.0000, 71.2500, 74.1607, 75.9987,
79.6416, 79.5179, 81.4822, 77.3214, 75.2143, 49.6071, 59.7513,
71.4350, 74.4822, 73.5000, 73.8214, 72.2322, 73.7143, 73.9822,
74.5893, 74.7322, 74.8572, 76.2947, 71.5714, 73.4822, 74.8533],
[63.4298, 61.0357, 61.6072, 59.6697, 57.8036, 59.2322, 56.5982,
57.2079, 55.3393, 56.3572, 56.5804, 58.7322, 79.7499, 73.1900,
65.2423, 75.5357, 74.5356, 75.6250, 72.5893, 74.7321, 74.6135,
75.8852, 75.6964, 75.7678, 76.4286, 74.2500, 74.7857, 76.1671],
[63.7870, 60.3750, 67.5179, 67.5446, 66.7857, 66.2857, 66.4515,
68.5089, 68.5714, 67.0714, 68.5982, 66.7678, 57.3929, 67.2806,
68.9503, 72.9286, 74.0893, 73.4911, 74.2143, 73.3393, 72.4873,
73.3916, 71.7500, 75.4821, 73.8393, 74.8750, 74.6429, 75.0906],
[72.9260, 69.0178, 67.9643, 69.2321, 67.5178, 67.3750, 66.3814,
64.8890, 63.8572, 64.9464, 66.9821, 66.3928, 63.0000, 64.7449,
74.8800, 63.5178, 72.2143, 73.2321, 74.9286, 74.5893, 71.6938,
74.8635, 73.9107, 75.5536, 75.8036, 76.2857, 76.3750, 75.2564],
[72.1160, 69.5000, 72.0000, 69.4375, 71.2500, 70.5000, 72.3392,
73.5982, 71.5000, 72.3750, 68.8750, 67.1249, 65.3750, 60.2856,
61.6427, 65.3749, 67.4999, 65.0624, 70.4999, 69.4999, 65.3124,
71.9107, 69.7499, 72.8750, 72.5625, 72.7500, 74.8750, 73.7053],
[64.3763, 64.8571, 70.4642, 66.7857, 64.3214, 65.3928, 67.4859,
68.7385, 67.8750, 67.8750, 71.0267, 72.8749, 67.5356, 59.4106,
58.7625, 70.2319, 62.5534, 65.7141, 68.1249, 69.0713, 65.2013,
72.8392, 67.1427, 71.7500, 72.8482, 72.6071, 74.4285, 74.0051],
[69.7219, 71.8214, 67.4464, 68.6518, 66.0178, 66.1071, 65.5089,
65.6964, 65.6964, 61.0714, 61.4375, 61.8214, 67.8214, 61.8762,
57.3354, 66.8749, 63.8571, 60.3302, 62.9999, 67.8214, 68.9043,
71.6365, 67.5357, 75.6250, 74.6518, 73.6071, 74.5178, 75.3877],
[72.2857, 66.2857, 63.1964, 69.2232, 68.8214, 70.2857, 68.7895,
70.2436, 70.1250, 66.8750, 69.9643, 66.0893, 52.8393, 60.3201,
52.9273, 66.8571, 58.0535, 57.3035, 63.2321, 60.1785, 59.6058,
69.9936, 69.4286, 73.4821, 72.7143, 72.8750, 72.7500, 74.0791],
[65.7334, 56.6430, 60.7143, 67.8035, 66.5178, 65.8214, 67.6760,
67.3061, 65.6964, 64.5893, 53.1430, 68.4820, 52.7676, 48.1604,
48.1311, 65.3034, 51.9640, 61.8213, 59.6605, 57.3927, 54.6974,
75.5752, 73.1250, 74.3928, 74.0446, 72.2142, 72.2857, 77.7806],
[55.4095, 60.0893, 69.7142, 66.0892, 66.8750, 65.6607, 67.1926,
66.3712, 63.0000, 56.9465, 41.6073, 48.6609, 61.8035, 39.7281,
44.9195, 61.5892, 47.5891, 62.7678, 56.9641, 55.9820, 58.1236,
70.0548, 70.3750, 69.8392, 68.1517, 72.0535, 76.5893, 65.4489],
[60.6237, 66.5714, 67.8571, 65.7232, 66.2500, 67.6250, 66.9311,
67.3303, 64.8214, 48.9644, 45.9019, 49.4108, 51.6608, 43.9259,
47.5012, 38.9642, 37.5356, 66.0000, 65.5178, 49.3392, 57.3571,
67.8252, 69.7678, 70.2143, 51.7410, 76.1607, 69.7143, 54.4056],
[61.9643, 67.2500, 66.5000, 65.6875, 66.2500, 65.0000, 65.0625,
65.5268, 63.7500, 49.8750, 50.4375, 53.1250, 38.7500, 25.3750,
43.4286, 31.1250, 35.3750, 59.7500, 63.3750, 39.5000, 51.8125,
58.6249, 69.5000, 70.1250, 48.0000, 75.8750, 48.7500, 61.4018],
[67.8915, 65.7500, 66.3035, 66.5982, 66.0357, 64.9464, 65.4643,
65.8074, 63.4643, 56.2325, 48.3306, 54.9467, 22.0715, 23.6990,
29.0955, 27.3211, 29.4997, 57.8660, 68.2321, 36.9819, 50.7715,
52.6707, 69.7143, 71.3392, 55.5534, 45.7855, 62.9463, 64.1556],
[63.8431, 66.0893, 65.3571, 65.6161, 65.0893, 64.6964, 64.3444,
65.1225, 62.9107, 57.4287, 57.3216, 54.9287, 26.4465, 30.5689,
23.2499, 23.5534, 25.1605, 55.1071, 69.4643, 41.9642, 52.6619,
59.8954, 72.0893, 79.7322, 47.2856, 64.5000, 52.9463, 81.6888],
[64.2589, 69.9643, 71.5000, 75.2857, 77.6786, 78.6429, 76.2513,
71.0089, 67.5536, 60.8929, 57.2501, 48.1072, 22.4821, 44.3316,
17.5369, 24.3928, 22.8214, 45.4821, 67.8036, 35.4821, 43.7028,
52.7806, 81.8929, 56.7321, 60.5357, 44.2321, 82.6964, 72.7500],
[63.6748, 61.8929, 58.0001, 41.7859, 47.3037, 35.2502, 40.0525,
63.9669, 76.1962, 74.6603, 67.2228, 43.3748, 19.9821, 37.0776,
15.6544, 30.9823, 22.0182, 51.0984, 65.8215, 32.5717, 49.4747,
39.5946, 49.5359, 55.7859, 40.7681, 81.7857, 76.0357, 73.2832],
[60.0192, 53.6429, 43.5359, 44.8037, 39.9287, 48.8037, 48.3241,
35.5882, 22.6071, 20.7142, 33.8838, 45.3570, 25.0714, 32.6657,
26.8559, 22.9644, 27.7324, 69.4375, 62.5001, 33.9823, 48.6047,
33.4811, 38.3930, 58.5358, 74.2857, 73.2679, 68.8572, 71.0817],
[63.2500, 63.3393, 43.1608, 50.3751, 68.6786, 69.6429, 63.9324,
65.5510, 59.6249, 54.3035, 40.5267, 20.6071, 32.1785, 31.9834,
30.0791, 20.3036, 34.1073, 71.0000, 56.2322, 48.2501, 42.9695,
37.1225, 53.7322, 68.3750, 76.2232, 72.4822, 70.6072, 72.9324],
[63.1071, 64.1250, 65.7500, 41.7500, 26.2500, 25.6250, 25.1071,
24.1339, 18.8750, 23.5000, 35.5625, 44.5000, 31.1250, 37.3393,
28.3125, 23.6250, 39.3750, 67.1875, 60.7500, 53.2500, 41.6250,
39.1339, 61.2500, 81.0000, 71.3125, 70.8750, 71.5000, 72.1339],
[67.4796, 68.1429, 68.9821, 76.4286, 75.0893, 74.6250, 73.8419,
72.7398, 58.4108, 44.3572, 33.2322, 19.8036, 32.6965, 29.7296,
28.5957, 19.8750, 42.7499, 69.9196, 66.3214, 51.9285, 43.6848,
44.9017, 64.2857, 73.2857, 71.7321, 71.4286, 73.9286, 73.5893],
[67.7080, 67.9465, 68.0358, 69.1786, 69.1071, 69.7857, 69.0650,
70.3635, 60.1247, 52.3744, 52.1690, 44.3031, 30.2678, 29.7014,
20.1314, 25.4645, 45.8042, 74.2947, 63.4110, 56.0183, 49.2722,
50.1485, 73.1251, 74.6608, 74.3036, 73.8572, 72.2322, 74.1570],
[67.5868, 68.5179, 68.1786, 66.9018, 67.3215, 67.9822, 67.2628,
65.4694, 49.2318, 43.7318, 39.5888, 47.7318, 29.2499, 28.3277,
15.6326, 30.8215, 34.2502, 64.6428, 63.3572, 63.0001, 50.1688,
51.6037, 77.5000, 75.8215, 73.7501, 74.9286, 74.3572, 74.6097]]]), 20)
torch.Size([1, 28, 28])
20
Train Epoch: 1 [0/8 (0%)]   Loss: -1.982941
Test set: Average loss: 0.0000, Accuracy: 0/2 (0%)

错误信息

这个输出是当它在没有segfault的情况下成功运行时,segfault通常在5次中发生4次。当segfault确实发生时,它从不发生处理MNIST子集的情况,它只发生在试图访问数据集[0]或任意一个1的数据集时,或者实际上是其中的任何一个,但如果我在任何索引上运行简单的print语句足够多次,我可以让它至少输出一次,而不会崩溃。这是一个它更优雅地坠毁的场合(输出张量信息和大小/类别,但坠毁在火车上:

torch.Size([1, 28, 28])
65
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
989         try:
--> 990             data = self._data_queue.get(timeout=timeout)
991             return (True, data)
9 frames
/usr/lib/python3.7/queue.py in get(self, block, timeout)
178                         raise Empty
--> 179                     self.not_empty.wait(remaining)
180             item = self._get()
/usr/lib/python3.7/threading.py in wait(self, timeout)
299                 if timeout > 0:
--> 300                     gotit = waiter.acquire(True, timeout)
301                 else:
/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame)
65         # Python can still get and update the process status successfully.
---> 66         _error_if_any_worker_fails()
67         if previous_handler is not None:
RuntimeError: DataLoader worker (pid 1132) is killed by signal: Segmentation fault. 
The above exception was the direct cause of the following exception:
RuntimeError                              Traceback (most recent call last)
<ipython-input-9-02c9a53ca811> in <module>()
68 
69 if __name__ == '__main__':
---> 70     main()
<ipython-input-9-02c9a53ca811> in main()
60 
61     for epoch in range(1, args.epochs + 1):
---> 62         train(args, model, device, train_loader, optimizerAdadelta, epoch)
63         test(model, device, test_loader)
64         scheduler.step()
<ipython-input-6-93be0b7e297c> in train(args, model, device, train_loader, optimizer, epoch)
2 def train(args, model, device, train_loader, optimizer, epoch):
3     model.train()
----> 4     for batch_idx, (data, target) in enumerate(train_loader):
5         data, target = data.to(device), target.to(device)
6         optimizer.zero_grad()
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
519             if self._sampler_iter is None:
520                 self._reset()
--> 521             data = self._next_data()
522             self._num_yielded += 1
523             if self._dataset_kind == _DatasetKind.Iterable and 
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
1184 
1185             assert not self._shutdown and self._tasks_outstanding > 0
-> 1186             idx, data = self._get_data()
1187             self._tasks_outstanding -= 1
1188             if self._dataset_kind == _DatasetKind.Iterable:
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _get_data(self)
1140         elif self._pin_memory:
1141             while self._pin_memory_thread.is_alive():
-> 1142                 success, data = self._try_get_data()
1143                 if success:
1144                     return data
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
1001             if len(failed_workers) > 0:
1002                 pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1003                 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
1004             if isinstance(e, queue.Empty):
1005                 return (False, None)
RuntimeError: DataLoader worker (pid(s) 1132) exited unexpectedly

然而,一般来说,这个问题似乎"因未知原因崩溃",以下是发生这种情况时我的日志:

记录

我认为发生了什么/我尝试了什么

我认为张量信息和图像的读取方式有问题。我一次最多只能处理40个图像,所以谷歌Colab上的磁盘资源或RAM没有故障的原因。我可能不正确地规范了数据,我尝试了不同的值,但还没有修复它。也许这些图像是腐败的?

我真的不太清楚会发生什么,否则,我早就解决了。我认为我提供了充足的资源,这对该领域的专业人士来说是一个突出的问题。我在这篇文章上花了很多时间,我希望有人能帮助我找出问题的根源。

如果我的代码以及我对网络和自定义数据集的使用有任何其他明显的问题,请告诉我,因为这是我第一次使用PyTorch。

谢谢!

我不确定是否相关的其他信息

自定义数据集类:

# ------------ Custom Dataset Class ------------
class PhytoplanktonImageDataset(Dataset):
def __init__(self, annotations_file, img_dir, transform, target_transform):
self.img_labels = pd.read_csv(annotations_file) # Image name and label file loaded into img_labels
self.img_dir = img_dir # directory to find all image names
self.transform = transform # tranforms to apply to images
self.target_transform = target_transform
def __len__(self):
return len(self.img_labels) # get length of csv file

def __getitem__(self, idx):
img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0]) 
image = torchvision.io.read_image(path=img_path) 
image = image.type(torch.FloatTensor)
label = self.img_labels.iloc[idx,1]
if self.transform:
image = self.transform(image)
if self.target_transform:
label = self.target_transform(label)
return image, label

NN类(唯一改变的是NN.Linear()有10个MNIST:输出

class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 90),
nn.ReLU()
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits

使用的Args:

args = parser.parse_args(['--batch-size', '64', '--test-batch-size', '64', 
'--epochs', '1', '--lr', '0.01', '--gamma', '0.7', '--seed','4', 
'--log-interval', '10'])

编辑:在其中一次运行中,我能够优雅地退出以下操作(此回溯是进入目标调用的一种方式):

<ipython-input-3-ae5ff8635158> in __getitem__(self, idx)
13     img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0]) # image path
14     print(img_path)
---> 15     image = torchvision.io.read_image(path=img_path) # Reading image to 1 dimensional GRAY Tensor uint between 0-255
16     image = image.type(torch.FloatTensor) # Now a FloatTensor (not a ByteTensor)
17     label = self.img_labels.iloc[idx,1] # getting label from csv
/usr/local/lib/python3.7/dist-packages/torchvision/io/image.py in read_image(path, mode)
258     """
259     data = read_file(path)
--> 260     return decode_image(data, mode)
/usr/local/lib/python3.7/dist-packages/torchvision/io/image.py in decode_image(input, mode)
237         output (Tensor[image_channels, image_height, image_width])
238     """
--> 239     output = torch.ops.image.decode_image(input, mode.value)
240     return output
241 
RuntimeError: Internal error.

以下是解码失败前打印的图像路径:/content/gdrive/My Drive/Colab Notebooks/all_images/sample_10/D20190926T145532_FCB122_00013.png下面是这个图像的样子:图像

有关此图像的信息:

颜色型号:灰色

深度:16

像素高度:50

像素宽度:80

图像DPI:每英寸72像素

文件大小:3557字节

我建议查看数据加载器中的num workers参数。如果您的num_workers参数过高,则可能导致此错误。因此,我建议将它降低到零,或者直到你没有得到这个错误。

Sarthak

最新更新