ValueError: logits和label必须具有相同的形状((None, 10) vs (None, 1)).&

我是新的tensorflow我试图建立一个简单的模型，将输出安装的概率(安装列)。

这里是数据集的子集:

{'A': {0: 12, 2: 28, 3: 26, 4: 9, 5: 36},
'B': {0: 10, 2: 17, 3: 22, 4: 2, 5: 31},
'C': {0: 1, 2: 0, 3: 5, 4: 0, 5: 1},
'D': {0: 5, 2: 0, 3: 0, 4: 0, 5: 0},
'E': {0: 12, 2: 1, 3: 4, 4: 3, 5: 1},
'F': {0: 12, 2: 2, 3: 14, 4: 9, 5: 11},
'install': {0: 0, 2: 0, 3: 1, 4: 0, 5: 0},
'G': {0: 21, 2: 12, 3: 8, 4: 13, 5: 19},
'H': {0: 0, 2: 5, 3: 1, 4: 6, 5: 5},
'I': {0: 21, 2: 22, 3: 5, 4: 10, 5: 20},
'J': {0: 0.0, 2: 136.5, 3: 0.0, 4: 0.1, 5: 29.5},
'K': {0: 0.15220949263502456,
2: 0.08139534883720931,
3: 0.15625,
4: 0.15384584755440725,
5: 0.04188829787234043},
'L': {0: 649, 2: 379, 3: 531, 4: 660, 5: 242},
'M': {0: 0, 2: 0, 3: 0, 4: 1, 5: 1},
'N': {0: 1, 2: 1, 3: 1, 4: 0, 5: 0},
'O': {0: 0, 2: 1, 3: 0, 4: 1, 5: 0},
'P': {0: 0, 2: 0, 3: 0, 4: 0, 5: 0},
'Q': {0: 1, 2: 0, 3: 1, 4: 0, 5: 1}}

这里是我正在处理的代码:

X = df.drop('install', axis=1) #data
y = df['install'] #target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42, test_size = 0.3)
X_train = ss.fit_transform(X_train)
X_test = ss.fit_transform(X_test)
model = keras.models.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(128, activation='softmax'),
keras.layers.Dropout(0.2),
keras.layers.Dense(10)
])
loss = keras.losses.BinaryCrossentropy(from_logits=True)
optim = keras.optimizers.Adam(lr=0.001)
metrics = ["accuracy"]
model.compile(loss=loss, optimizer=optim, metrics=metrics)
batch_size = 32
epoch = 5
model.fit(X_train, y_train, batch_size=batch_size, epochs=epoch, shuffle=True, verbose=1)

你能帮我理解这个错误吗?我明白问题出在X和y的大小上。

注意:您没有指定ss对象属于哪个类，因此我将讨论如何删除它。

首先让我们讨论一下你的目标。即安装列。根据这些值，我假设你的问题是二元分类，即预测0和1，你想要拥有它们的概率。

为此，您必须按如下方式定义您的模型。

model = keras.models.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(2, activation='softmax')
])
'''
Note: I have changed the activation of the first `dense` layer from
'softmax` to `relu` as `softmax` is not ideal for inner layers as it greatly
reduce information from each node. Although having 'softmax' will not result
in any syntax error but it is methodologically wrong.
Now the next major change is changing the number of units in the last
`Dense` layer from 10 to 2. What you want is the probability of having
either 0 or 1. So if you have the have the output from your model as `[a ,
b]` here a is some value corresponding to 0 and b corresponding to 1 then
you can get probability on them using the 'softmax' activation. Without
activation the values we get are called 'logits'.
'''
# Now you have to change your loss function as below
loss = tf.keras.losses.SparseCategoricalCrossentropy()
# The rest is same. Now we run a dummy trial of the model after training it using your code.
preds = model.predict(X_test)
preds
'''
This gives the results:
array([[9.9999726e-01, 2.7777487e-06],
[9.5156413e-01, 4.8435837e-02]], dtype=float32)
This says the probability of sample 1 being 0 is '9.9999726e-01' i.e.
'0.999..' and of it being 1 is '2.7777487e-06' i.e. '0.00000277..` and these
gracefully sum up to 1. Same for the sample 2.
'''

还有另一种方法。因为你只有一个标签，因此如果你有对应于这个标签的概率，那么你可以通过从1中减去它来得到对应于另一个标签的概率。你可以这样实现它:

model = keras.models.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.Dense(1, activation='sigmoid')
])
'''
The difference is 'softmax' and 'sigmoid' is that the 'softmax' is applied
on all the units in a unified manner but 'sigmoid' is applied on each
individual unit. So you can say that 'softmax' is the applied on the 'layer'
and 'sigmoid' is applied on the 'units'.
Now the output of the 'sigmoid' is the probability of the result being 1. So
we can say that the result could either be 0 or 1 depending on the output
probability with some threshold and hence we will not use a different loss
that is BinaryCrossEntropy as the values will be binary (either 0 or 1).
'''
loss = keras.losses.BinaryCrossentropy() # again without logits
# We once again the train the model using the rest of the code and analyze
the outputs.
preds = model.predict(X_test)
preds
'''
This gives the results:
array([[1.6424768e-13],
[2.0349980e-06]], dtype=float32)
So for sample 1 we have the probability of it being '1' as '1.6424768e-13'
and as we have only '1' and '0' the probability of it being '0' is '1 -
1.6424768e-13'. Same for the sample 2.
'''

现在来看看@Mattpats的回答。这个答案也可以工作，但在这种情况下，您将不会得到概率作为输出，而是您将得到logits，因为您没有使用任何activation，并且通过指定参数from_logits=True在logits上计算损失。从这个概率你必须像下面这样使用它:

preds = model.predict(X_test)
sigmoid_preds = tf.math.sigmoid(preds).numpy()
preds, sigmoid_preds
'''
This give the following results:
preds = array([[-51.056973],
[-32.444508]], dtype=float32)
sigmoid_preds = array([[6.702527e-23],
[8.119502e-15]], dtype=float32)
'''

正如现在所写的，您创建形状为(3,)的测试标签y_train，每个列车标签只是0或1。该网络设置为从10个类别中获取训练标签。这就是这一行在模型创建阶段所做的:

keras.layers.Dense(10)

要更改为二进制分类，建议将最后一层更改为

keras.layers.Dense(1, activation='sigmoid')

您还需要将损失修改为:

loss = keras.losses.BinaryCrossentropy()

如果您想创建一个包含10个类的多类分类，那么您需要将y_train修改为包含10列的数组。

我相信你的网络的最后一层输出了10个值，而它应该是1。

model = keras.models.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(128, activation='softmax'),
keras.layers.Dropout(0.2),
keras.layers.Dense(1) # needs to be 1
])

相关内容

最新更新

热门标签：