如何删除% NAN高于某个数字的浮点功能?



我正在尝试删除一个功能,如果浮点数和缺失值的数量高于某个数字。

我试过:

# Define threshold to 1/6
threshold = 0.1667
# Drop float > threshold 
for f in data: 
if data[f].dtype==float & data[f].isnull().sum() / data.shape[0] > threshold: del data[f]

..这引发了一个错误:

类型

错误:& 的不受支持的操作数类型:"类型"和 'numpy.float64'

将提供帮助。

仅对浮点列使用DataFrame.select_dtypes,检查缺失值并获取mean-sum/count并按Series.reindex添加另一个非浮点列,最后一个按条件过滤inverse条件>boolean indexing<=

np.random.seed(2019)
df = pd.DataFrame(np.random.choice([np.nan,1], p=(0.2,0.8),size=(10,10))).assign(A='a')
print (df)
0    1    2    3    4    5    6    7    8    9  A
0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
1  1.0  1.0  NaN  1.0  NaN  1.0  NaN  1.0  1.0  1.0  a
2  1.0  1.0  1.0  1.0  1.0  NaN  1.0  NaN  1.0  1.0  a
3  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  NaN  1.0  a
4  1.0  NaN  1.0  1.0  1.0  1.0  1.0  NaN  1.0  1.0  a
5  1.0  1.0  1.0  1.0  1.0  1.0  NaN  1.0  1.0  1.0  a
6  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
7  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
8  1.0  NaN  1.0  1.0  1.0  1.0  NaN  1.0  1.0  1.0  a
9  NaN  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  NaN  a
threshold = 0.1667
df1 = df.select_dtypes(float).isnull().mean().reindex(df.columns, fill_value=False)
df = df.loc[:, df1 <= threshold]
print (df)
0    2    3    4    5    8    9  A
0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
1  1.0  NaN  1.0  NaN  1.0  1.0  1.0  a
2  1.0  1.0  1.0  1.0  NaN  1.0  1.0  a
3  1.0  1.0  1.0  1.0  1.0  NaN  1.0  a
4  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
5  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
6  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
7  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
8  1.0  1.0  1.0  1.0  1.0  1.0  1.0  a
9  NaN  1.0  1.0  1.0  1.0  1.0  NaN  a

最新更新