如何正确预测Keras回归问题中的负值



我正试图为我的数据集采用标准的波士顿住房问题,不同之处在于我在数据集中有负值,并且希望预测输出中的负值。

当我在StackOverflow中读到预测负值时,我应该在输出层上使用than激活函数。此外,我知道我应该将数据集标准化为-1,1范围。

所以我有两个问题。我有两种代码变体。

  1. 我的第一个代码变体正确吗?我没有发现任何负数的公共数据集需要检查,也不知道如何确保它运行良好。

  2. 在第二种变体中,我的NN预测一个值;0.9〃;,但我的数据集值类似于";24〃;。我认为这是因为在这个代码中没有适当的规范化。请告诉我如何实现正常化。

我在Keras方面的经验很差,在Python方面的技能也不太强,所以我只是试着从不同的地方汇编一段代码。

第一个代码:

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense
#read in training data
train_df = pd.read_csv('train.csv', index_col='ID')
train_df.head()
target = 'medv'
scaler = MinMaxScaler(feature_range=(-1, 1)) ## tut byl 0,1
scaled_train = scaler.fit_transform(train_df)
# Print out the adjustment that the scaler applied to the total_earnings column of data
print("Note: median values were scaled by multiplying by {:.10f} and adding {:.6f}".format(scaler.scale_[13], scaler.min_[13]))
multiplied_by = scaler.scale_[13]
added = scaler.min_[13]
scaled_train_df = pd.DataFrame(scaled_train, columns=train_df.columns.values)
#build our model
model = Sequential()
model.add(Dense(50, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='tanh')) #tut nichego
model.compile(loss='mean_squared_error', optimizer='adam')
X = scaled_train_df.drop(target, axis=1).values
Y = scaled_train_df[[target]].values
# Train the model
model.fit(
X[10:],
Y[10:],
epochs=100,
shuffle=True,
verbose=2
)
#inference
prediction = model.predict(X[:4])
y_0 = prediction[0][0]
print('Prediction with scaling - {}',format(y_0))
y_0 -= added
y_0 /= multiplied_by
print("Housing Price Prediction  - ${}".format(y_0))

Prediction with scaling - {} -0.1745799034833908
Housing Price Prediction  - $23.571952171623707

代码的第二种变体:

# Regression Example With Boston Dataset: Standardized and Larger
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy
# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
Y = dataset[:,13]
# define the model
def larger_model():
# create model
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='tanh'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=50, batch_size=5, verbose=1)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=10)
results = cross_val_score(pipeline, X, Y, cv=kfold)
print("Larger: %.2f (%.2f) MSE" % (results.mean(), results.std()))
pipeline.fit(X, Y)
#prediction = pipeline.predict(numpy.array([[0.0273, 0., 7.07, 0., 0.469, 6.421, 78.9, 4.9671, 2., 242., 17.8, 396.9, 9.14]]))
prediction = pipeline.predict(numpy.array([[0.7258, 0., 8.14, 0., 0.538, 5.727, 69.5, 3.7965, 4., 307., 21.0, 390.95, 11.28]]))
print(prediction)

结果:

......
......
102/102 [==============================] - 0s 927us/step - loss: 548.0819
Epoch 50/50
102/102 [==============================] - 0s 912us/step - loss: 548.0818
1/1 [==============================] - 0s 0s/step
0.99998754

链接到train.csv

链接到house.csv

在最终输出层中,您使用的是tanh激活,这是一个问题。tanh激活函数将提供从-1到+1的输出。您可以尝试线性激活函数而不是tanh

最新更新