PyTorch在M1 Mac上:RuntimeError:占位符存储尚未在MPS设备上分配



我正在PyTorch 1.13.0中训练一个模型(我也在M1 Mac上的夜间构建火炬-1.14.0.dev20221207上尝试过这个模型,但无济于事),并希望使用MPS硬件加速。在我的项目中,我有以下相关代码来发送模型和输入张量到MPS:

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu") # This always results in MPS
model.to(device)

…在我的Dataset子类中:

class MyDataset(Dataset):
def __init__(self, df, window_size):
self.df = df
self.window_size = window_size
self.data = []
self.labels = []
for i in range(len(df) - window_size):
x = torch.tensor(df.iloc[i:i+window_size].values, dtype=torch.float, device=device)
y = torch.tensor(df.iloc[i+window_size].values, dtype=torch.float, device=device)
self.data.append(x)
self.labels.append(y)
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx], self.labels[idx]

这将在我的第一个训练步骤中产生以下回溯:

Traceback (most recent call last):
File "lstm_model.py", line 263, in <module>
train_losses, val_losses = train_model(model, criterion, optimizer, train_loader, val_loader, epochs=100)
File "lstm_model.py", line 212, in train_model
train_loss += train_step(model, criterion, optimizer, x, y)
File "lstm_model.py", line 191, in train_step
y_pred = model(x)
File "miniconda3/envs/pytenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "lstm_model.py", line 182, in forward
out, _ = self.lstm(x, (h0, c0))
File "miniconda3/envs/pytenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "miniconda3/envs/pytenv/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 774, in forward
result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: Placeholder storage has not been allocated on MPS device!

我尝试在没有指定设备的情况下在我的Dataset子类中创建张量,然后在它们上调用.to(device):

x = torch.tensor(df.iloc[i:i+window_size].values, dtype=torch.float)
x = x.to(device)
y = torch.tensor(df.iloc[i+window_size].values, dtype=torch.float)
y = y.to(device)

我还尝试创建没有在我的数据集子类中指定的设备的张量,并在我的模型的forward方法和我的train_step函数中向device发送张量。

如何解决我的错误?

您的代码可能存在的问题是您没有将输入发送到您的训练循环中的设备。你应该同时发送模型和设备的输入,你可以在这篇博客文章中读到。

示例代码如下:

def train(model, train_loader, device, *args):
model.train()
for it, batch in tqdm(enumerate(train_loader), desc="Epoch %s: " % (epoch), total=train_loader.__len__()):
batch = {'data': batch['data'].to(device), 'labels': batch['labels'].to(device)}
# perform training
...
# set model and device
model = MyWonderfulModel(*args)
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model.to(device)
# call training function
train(model, train_loader, device, *args)

使用MPS在我的M1 Mac上运行这样的训练功能。

尝试更改此代码device = torch.device("mps"if torch. backbackend . MPS .is_available() else "cpu") #这总是导致MPS到device = torch.device("mps")

最新更新