在fastai/Pytorch使用learn.fit_one_cycle()时出现Cpu/Runtime错误



这是我第一次在16GB内存的笔记本电脑上使用fastai正确训练cnn模型,我试图遵循一个有以下代码的教程:

np.random.seed(42)
data = vision.ImageDataBunch.from_folder(path, valid_pct=0.2,
ds_tfms=get_transforms(), size=224, num_workers=4, bs=32).normalize(imagenet_stats)
data.classes, data.c, len(data.train_ds), len(data.valid_ds)
learn = cnn_learner(data, models.resnet50, metrics=accuracy).to_fp16()
learn.fit_one_cycle(4)

当我尝试运行learn.fit_one_cycle(4)时,它返回这个错误:

epoch     train_loss  valid_loss  accuracy  time    
c:userslu_41fastai1fastaivisiontransform.py:247: UserWarning: torch.solve is deprecated in favor of torch.linalg.solveand will be removed in a future PyTorch 
release.
torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at  ..atensrcATennativeBatchLinearAlgebra.cpp:859.)
return _solve_func(B,A)[0][:,0]
c:userslu_41fastai1fastaivisiontransform.py:247: UserWarning: torch.solve is deprecated in favor of torch.linalg.solveand will be removed in a future PyTorch 
release.
torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at  ..atensrcATennativeBatchLinearAlgebra.cpp:859.)
return _solve_func(B,A)[0][:,0]
c:userslu_41fastai1fastaivisiontransform.py:247: UserWarning: torch.solve is deprecated in favor of torch.linalg.solveand will be removed in a future PyTorch 
release.
torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at  ..atensrcATennativeBatchLinearAlgebra.cpp:859.)
return _solve_func(B,A)[0][:,0]
c:userslu_41fastai1fastaivisiontransform.py:247: UserWarning: torch.solve is deprecated in favor of torch.linalg.solveand will be removed in a future PyTorch 
release.
torch.linalg.solve has its arguments reversed and does not return the LU factorization.
To get the LU factorization see torch.lu, which can be used with torch.lu_solve or torch.lu_unpack.
X = torch.solve(B, A).solution
should be replaced with
X = torch.linalg.solve(A, B) (Triggered internally at  ..atensrcATennativeBatchLinearAlgebra.cpp:859.)
return _solve_func(B,A)[0][:,0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:userslu_41fastai1fastaitrain.py", line 23, in fit_one_cycle
learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
File "c:userslu_41fastai1fastaibasic_train.py", line 200, in fit
fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
File "c:userslu_41fastai1fastaibasic_train.py", line 101, in fit
loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
File "c:userslu_41fastai1fastaibasic_train.py", line 26, in loss_batch
out = model(*xb)
File "C:Userslu_41AppDataLocalProgramsPythonPython310libsite-packagestorchnnmodulesmodule.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:Userslu_41AppDataLocalProgramsPythonPython310libsite-packagestorchnnmodulescontainer.py", line 141, in forward
input = module(input)
File "C:Userslu_41AppDataLocalProgramsPythonPython310libsite-packagestorchnnmodulesmodule.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:Userslu_41AppDataLocalProgramsPythonPython310libsite-packagestorchnnmodulescontainer.py", line 141, in forward
input = module(input)
File "C:Userslu_41AppDataLocalProgramsPythonPython310libsite-packagestorchnnmodulesmodule.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "C:Userslu_41AppDataLocalProgramsPythonPython310libsite-packagestorchnnmodulesconv.py", line 447, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:Userslu_41AppDataLocalProgramsPythonPython310libsite-packagestorchnnmodulesconv.py", line 443, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

有人知道我可以尝试做什么来解决这个问题吗?这和我的CPU有关吗?

edit:看起来当时我使用的是旧版本的fastai,并使用了一些旧的和/或废弃的函数。遵循最新的文档修复了这个问题。

尝试改变to_fp16() ->to_fp32 ()

我也有同样的错误。如果你真的想使用fp16,有两个选择。

  1. 使用bfloat16代替float16
  2. 将数据和模型移动到gpu,然后再试一次

如果选择执行2,可以使用以下命令

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

上面的应该返回cuda:0,这意味着你有gpu。然后,您可以使用以下命令将模型和数据移动到gpu。

model.to(device)
inputs, labels = data[0].type(torch.float16).to(device), data[1].to(device)

相关内容

最新更新