如何获取pandas df中相对于列名的列计数



我正在做数据预处理和管理缺失值。我想在列上设置阈值。对于单个列,如果值count小于50,则删除该列。

import numpy as np
import pandas as pd
from pandas import DataFrame
df = pd.read_csv('cbc_updated_1.csv')

然后得到列数

a = df.count(axis = 0)
print(a)

根据列的计数给出列的名称。

IP ABN(RBC)RET Abn Scattergram       46
IP ABN(RBC)Reticulocytosis           23
IP ABN(PLT)Thrombocytosis            47
IP ABN(PLT)PLT Abn Scattergram        0
IP SUS(WBC)Blasts?                   57
IP SUS(WBC)Abn Lympho?               10
IP SUS(WBC)Left Shift?              190
IP SUS(WBC)Atypical Lympho?         126
IP SUS(RBC)RBC Agglutination?         0
IP SUS(RBC)Turbidity/HGB Interf?      9
IP SUS(RBC)Iron Deficiency?          27
IP SUS(RBC)HGB Defect?                3
IP SUS(RBC)Fragments?               168
IP SUS(PLT)PLT Clumps?               73
dtype: int64

接下来我想在上面的数据上运行循环来检查我的阈值条件…但是我做不到……我试了下面的代码…

for i in a:
if i < 50:
print(i)

结果我只得到了值,没有得到列名。我两个都需要。

46
23
47
0
10
0
9
27
3

我怎样才能得到这个?

试试这个:

>>> a[a < 50]
IP ABN(RBC)RET Abn Scattergram      46
IP ABN(RBC)Reticulocytosis          23
IP ABN(PLT)Thrombocytosis           47
IP ABN(PLT)PLT Abn Scattergram       0
IP SUS(WBC)Abn Lympho?              10
IP SUS(RBC)RBC Agglutination?        0
IP SUS(RBC)Turbidity/HGB Interf?     9
IP SUS(RBC)Iron Deficiency?         27
IP SUS(RBC)HGB Defect?               3
dtype: int64
>>> 

如果你想循环:

for x in a[a < 50].reset_index().to_numpy().tolist():
print(*x)

IP ABN(RBC)RET Abn Scattergram 46
IP ABN(RBC)Reticulocytosis 23
IP ABN(PLT)Thrombocytosis 47
IP ABN(PLT)PLT Abn Scattergram 0
IP SUS(WBC)Abn Lympho? 10
IP SUS(RBC)RBC Agglutination? 0
IP SUS(RBC)Turbidity/HGB Interf? 9
IP SUS(RBC)Iron Deficiency? 27
IP SUS(RBC)HGB Defect? 3

最新更新