在尝试替换熊猫中的异常值时出现问题



好的,所以我试图为机器学习项目清理数据。我用Z-Score来检测异常值。数据库包含不同类型的玻璃(从1-7),我想通过每种玻璃类型,找到异常值,并将其替换为给定类型玻璃中所含钠的平均值("Na"列)。奇怪的是,该算法适用于类型1和类型2的玻璃,但当涉及到类型3时,它给出了ValueError。你们知道问题出在哪里吗?

z = stats.zscore(DataFrame.Na)
threshold = 1.99
for t in DataFrame.Type.unique():
z = stats.zscore(DataFrame.Na[DataFrame.Type==t])
print([DataFrame.Na[DataFrame.Type==t][(np.abs(z) > threshold)]])
DataFrame.Na[DataFrame.Type==t] = DataFrame.Na[DataFrame.Type==t].replace([DataFrame.Na[DataFrame.Type==t][(np.abs(z) > threshold)]],np.mean(DataFrame.Na[DataFrame.Type==t]))

输出为:

[17    14.36
21    14.77
Name: Na, dtype: float64]
[70     14.86
105    11.45
106    10.73
108    14.43
110    11.23
111    11.02
Name: Na, dtype: float64]
[149    12.16
Name: Na, dtype: float64]
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897             try:
-> 2898                 return self._engine.get_loc(casted_key)
2899             except KeyError as err:
KeyError: 0

你们有人知道这可能有什么问题吗?如果你需要任何额外的信息,我会提供它,考虑这个大约2个小时,我没有线索…

我不能评论,所以我将发表我的评论作为答案。

您是否正在尝试检测"异常值"?或";outliners"。这里不只是迂腐,因为它们是不同的统计概念。

正在发生的事情是,在某个地方,您试图在没有行0的数据框中设置行0的值。试着把你的长行分开,然后把结果打印到控制台,这样你可能会发现错误。

最新更新