如何实现我的for循环和调用我的函数得到我的岭回归模型的结果?



我是一个新手,我从数据开始,我试图预测由于回归模型的房子的价格,在kaggle文件:train.csv上的价格练习,我要探索数据

import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score , cross_val_predict, cross_validate
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import Ridge, LassoCV
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
%matplotlib inline
data  = pd.read_csv('train.csv', index_col = 0)
pd.set_option('display.max_row',80)
pd.set_option('display.max_column',80)
hp = data.copy()

hp = hp.apply(lambda x: x.fillna(0) if x.dtype.kind in 'biufc' else x.fillna(hp.mean()))
hp.isna().sum()

from sklearn.preprocessing import StandardScaler
numeric_features = hp.select_dtypes(exclude = ['object'])
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_features  = hp.select_dtypes('object')
categorical_features= categorical_features.fillna('Z')
categorical_transformer  = 
OneHotEncoder(handle_unknown='ignore')
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, 
categorical_features)])

y = numeric_features['SalePrice']
X = numeric_features.drop('SalePrice', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)
scaler = preprocessing.StandardScaler().fit(X)
X_train_scaled = scaler.transform(X_train)
X_test_scaled  = scaler.transform(X_test)


def rmse_cv(model):
rmse= np.sqrt(-cross_val_score(clf, X_train_scaled, y_train,  
scoring="neg_mean_squared_error", cv = 5))
return(rmse)

如何实现我的for循环和调用我的函数,以获得我的岭回归模型的结果?

coefs = []
rmse  = []
alphas = [0.01, 0.05 , 0.1, 0.3, 0.8, 1, 5,10, 15, 30,50]
for a in alphas:
ridge = Ridge(alpha=a, fit_intercept=True)
ridge.fit(X_train_scaled, y_train)
coefs.append(ridge.coef_)
rmse_cv()

在sklearn的文档中,包含了一个返回预测结果的方法。

另一个在线资源smith.edu用一个例子解释了如何使用该方法。

如何将它们集成到您的解决方案中:

coefs = []
rmse  = []
alphas = [0.01, 0.05 , 0.1, 0.3, 0.8, 1, 5,10, 15, 30,50]
for a in alphas:
ridge = Ridge(alpha=a, fit_intercept=True)
ridge.fit(X_train_scaled, y_train)
pred = ridge.predict(X_test)  
coefs.append(ridge.coef_)
rmse.append = mean_squared_error(y_test, pred)          # Calculate the test MSE

最新更新