机器学习的数据挖掘



我开始进行数据分析,我遇到了一个问题,在练习中恢复kaggle:文件'ENBsv'我导入我的数据,确定相关性,在我的数据框架中创建一个新列,汇总我的目标变量

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn import model_selection
from sklearn.model_selection import validation_curve
from sklearn import ensemble
from sklearn import svm
from sklearn import neighbors
from sklearn.model_selection import train_test_split 
from sklearn import preprocessing
from sklearn.ensemble import VotingClassifier

df = pd.read_csv('ENB.csv')
df.columns= ["relative_compactness","surface_area","wall_area","roof_area","overall_height","orientaion",
"glazing_area","glazing_area_dist","heating_load","cooling_load"]
df.head()
corr =df.corr(method = 'pearson')
plt.figure(figsize = (20,10))
sns.heatmap(df.corr(), annot=True, cmap='Greens');
df['total_charges'] = pd.Series([1]).astype(dtype=float)
df['total_charges'] = df['heating_load'] + df['cooling_load']


我必须实例化新变量'charges_classes',根据创建的新变量的3个分位数,将建筑物分为4个不同的类,标签为0,1,2,3。但是我一直在寻找,我找不到一个解决方案,有人可以帮助我,这就是我所做的:

charge_classes = pd.get_dummies(df['total_charges'])
charge_classes

你可以使用qcut:

df['charge_classes'] = pd.qcut(df['total_charges'], 4, labels=False)

最新更新