覆盖异常值



我有一个包含3个数字变量的数据框架,我试图将异常值限制在0.01到0.99的百分位之间,但它不起作用。

df[['TotalVisits', 'Total Time Spent on Website', 
'Page Views Per Visit']].describe(percentiles=[.25, .5, .75, .90, .95, .99])

这是输出

现在我试图将异常值限制为:

q_l = df['TotalVisits'].quantile(0.00)
q_h = df['TotalVisits'].quantile(0.99)
df['TotalVisits'][df['TotalVisits']<= q_l] = q_l
df['TotalVisits'][df['TotalVisits']>= q_h] = q_h

但是输出保持不变,而不是最大值变为17。

您正在进行链切片分配,这将使失败

修复你的代码

q_l = df['TotalVisits'].quantile(0.00)
q_h = df['TotalVisits'].quantile(0.99)
df.loc[df['TotalVisits']<= q_l, 'TotalVisits'] = q_l
df.loc[df['TotalVisits']>= q_h], 'TotalVisits'] = q_h

并利用熊猫功能对其进行改进clip

df['TotalVisits'] = df['TotalVisits'].clip(lower = q_l, upper = q_h)

最新更新