从数值特征中去除异常值



嗨,我试图从列中删除数值特征的异常值,但当我执行我的代码时,整个数据集被删除,任何人都可以告诉我我做错了什么请

numerical_columns = data.select_dtypes(include=['int64','float64']).columns.tolist()
print('Number of rows before discarding outlier = %d' % (data.shape[0]))
for i in numerical_columns:
q1 = data[i].quantile(0.25)
q3 = data[i].quantile(0.75)
iqr = q3-q1 #Interquartile range
fence_low  = q1-1.5*iqr
fence_high = q3+1.5*iqr
data = data.loc[(data[i] > fence_low) & (data[i] < fence_high)]
print('Number of rows after discarding outlier = %d' % (data.shape[0]))

下面的代码已经为我工作。这里,col是需要删除离群值的数据框的数值列

#Remove Outliers: keep only the ones that are within +3 to -3 
# standard deviations in the column   
df = df[np.abs(df[col]-df[col].mean()) <= (3*df[col].std())]

相关内容

  • 没有找到相关文章

最新更新