数据集的归一化,错误:输入的所有元素都应该在0到1之间



当我尝试执行训练时,我在PyTorch中有数据规范化问题。您需要知道的第一件事是,数据集由3024个信号窗口(即1个通道)组成,每个信号窗口的长度为5000个样本,因此CSV文件的尺寸为5000x3024。每个信号有一个需要预测的标签。下面是我如何加载和规范化数据的代码:

class CSVDataset(Dataset):
# load the dataset
def __init__(self, path, normalize = False):
# load the csv file as a dataframe
df = read_csv(path)
df = df.transpose()
# store the inputs and outputs
self.X = df.values[:, :-1]
self.y = df.values[:, -1]
print("Dataset length: ", self.X.shape[0])
# ensure input data is floats
self.X = self.X.astype(np.float)
self.y = self.y.astype(np.float)

if normalize:
self.X = self.X.reshape(self.X.shape[1], self.X.shape[0])
min_X = np.min(self.X,0)  # returns an array of means for each signal window
max_X = np.max(self.X,0)
self.X = (self.X - min_X)/(max_X-min_X)
min_y = np.min(self.y) 
max_y = np.max(self.y)
self.y = (self.y - min_y)/(max_y-min_y)

# reshape input data
self.X = self.X.reshape(self.X.shape[0], 1, self.X.shape[1])
self.y = self.y.reshape(self.y.shape[0], 1)
# label encode target and ensure the values are floats
self.y = LabelEncoder().fit_transform(self.y)
self.y = self.y.astype(np.float)
# prepare the dataset
def prepare_data(path):
# load the dataset
dataset = CSVDataset(path, normalize = True)
# calculate split
train, test = dataset.get_splits()
# prepare data loaders
train_dl = DataLoader(train, batch_size=32, shuffle=True)
test_dl = DataLoader(test, batch_size=1024, shuffle=False)
return train_dl, test_dl

而列车方法是:

def train_model(train_dl, model):
# define the optimization
criterion = BCELoss()
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)
model = model.float()
# enumerate epochs
for epoch in range(100):
# enumerate mini batches
for i, (inputs, targets) in enumerate(iter(train_dl)):
targets = torch.reshape(targets, (32, 1))
# clear the gradients
optimizer.zero_grad()
# compute the model output
yhat = model(inputs.float())
# calculate loss
loss = criterion(yhat, targets.float())
# credit assignment
loss.backward()
# update model weights
optimizer.step()

我得到的错误是在loss = criterion(yhat, targets.float())行,它说:

RuntimeError: all elements of input should be between 0 and 1

我试着检查变量资源管理器中的X,似乎没有任何不在0和1之间的值。我不知道我在标准化的过程中做错了什么。你能帮我吗?

内置损失函数分别引用inputtarget来指定预测标签实例。错误消息应该被理解为"标准的输入"。yhat,而不是作为"模型输入"。

似乎yhat不属于[0, 1],而BCELoss期望一个概率,而不是一个logit。

  • 添加一个sigmoid层作为模型的最后一层,或者

  • 使用nn.BCEWithLogitsLoss代替,它结合了s型和bce损耗。

最新更新