我用一些配置训练了我的网络,然后保存了它的快照。
现在我正试图从上一个快照恢复,但它失败了,并显示以下错误消息:
I0328 13:44:30.756110 24238 net.cpp:283] Network initialization done.
I0328 13:44:30.756206 24238 solver.cpp:60] Solver scaffolding done.
I0328 13:44:30.757062 24238 caffe.cpp:209] Resuming from /media/hossein/tmpstore/caffe_new/examples/cifar10/cifar10_full_relu_bn_iter_60000.caffemodel.h5
HDF5-DIAG: Error detected in HDF5 (1.8.15-patch1) thread 0:
#000: H5D.c line 358 in H5Dopen2(): not found
major: Dataset
minor: Object not found
#001: H5Gloc.c line 430 in H5G_loc_find(): can't find object
major: Symbol table
minor: Object not found
#002: H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#003: H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#004: H5Gloc.c line 385 in H5G_loc_find_cb(): object 'iter' doesn't exist
major: Symbol table
minor: Object not found
F0328 13:44:30.786376 24238 hdf5.cpp:153] Check failed: status >= 0 (-1 vs. 0) Failed to load int dataset with name iter
*** Check failure stack trace: ***
@ 0x7f2d6e635daa (unknown)
@ 0x7f2d6e635ce4 (unknown)
@ 0x7f2d6e6356e6 (unknown)
@ 0x7f2d6e638687 (unknown)
@ 0x7f2d6ed74acd caffe::hdf5_load_int()
@ 0x7f2d6ed678d0 caffe::SGDSolver<>::RestoreSolverStateFromHDF5()
@ 0x7f2d6ed4bf19 caffe::Solver<>::Restore()
@ 0x408038 train()
@ 0x405a0c main
@ 0x7f2d6d943ec5 (unknown)
@ 0x406141 (unknown)
@ (nil) (unknown)
Aborted (core dumped)
这就是我试图恢复它的方式:
#!/usr/bin/env sh
TOOLS=./build/tools
$TOOLS/caffe train
--solver=examples/cifar10/cifar10_full_solver_bn_lr2.prototxt
--snapshot=/media/hossein/tmpstore/caffe_new/examples/cifar10/cifar10_full_relu_bn_iter_60000.caffemodel.h5
然后我放弃了,我尝试使用BINARYPROTO
而不是HDF5
,但我得到了这个错误:
I0328 16:35:34.721277 27243 net.cpp:283] Network initialization done.
I0328 16:35:34.721369 27243 solver.cpp:60] Solver scaffolding done.
I0328 16:35:34.722338 27243 caffe.cpp:209] Resuming from /media/hossein/tmpstore/caffe_new/examples/cifar10_full_relu_bn_iter_60000.caffemodel
F0328 16:35:39.143900 27243 sgd_solver.cpp:316] Check failed: state.history_size() == history_.size() (0 vs. 28) Incorrect length of history blobs.
*** Check failure stack trace: ***
@ 0x7fd1c2cbbdaa (unknown)
@ 0x7fd1c2cbbce4 (unknown)
@ 0x7fd1c2cbb6e6 (unknown)
@ 0x7fd1c2cbe687 (unknown)
@ 0x7fd1c33ef097 caffe::SGDSolver<>::RestoreSolverStateFromBinaryProto()
@ 0x7fd1c33d1ed3 caffe::Solver<>::Restore()
@ 0x408038 train()
@ 0x405a0c
main
@ 0x7fd1c1fc9ec5 (unknown)
@ 0x406141 (unknown)
@ (nil) (unknown)
Aborted (core dumped)
当我用不同的型号尝试不同的时间时,历史部分会发生变化(例如58对28,32对28,像这样,总体错误是相同的,但数字不同!)
我该怎么办?这让我疯了!
作为--snapshot参数的值,您必须传递.solverstate.h5文件,而不是.caffemol.h5文件。