索引未在数据框架中显示 - 需要显示相应的索引，然后根据使用PANDA的阈值删除列 - Index not showing in dataframe - need to display corresponding index then delete columns based on threshold using Pandas 小贝子编程网

我刚刚开始学习Python，因此对任何帮助都非常感谢。

因此，此处的总体目的是用于数据探索数据清洁。

我写下面的功能输出了一个数据框，该数据框显示每个列中缺少值的百分比。

def missing_values_table(df):
    missing_vals = df.isnull().sum()
    # Boolean check of all value to True for all null values, then sums for total count.
    percent_conversion = 100 * df.isnull().sum()/len(df)
    # Percent conversion.
    combined_table = pd.concat([missing_vals, percent_conversion], axis=1)
    # Merging dataframes.
    table_renamed = combined_table.rename(columns = 
        {0:'Missing Values', 1:'Percentage'})
    # Giving column labels.
    table_renamed.sort_values(['Percentage'], ascending=False, inplace=True)
    # Sort descending.
    return table_renamed

有问题的输出（这是缺少索引，它将向我显示它们在原始数据框中的位置...庞大）：

                          Missing Values  Percentage
Engine_Horsepower                 375906   93.712932
Pushblock                         375906   93.712932
Enclosure_Type                    375906   93.712932
Blade_Width                       375906   93.712932
[...]

所需的输出：

                          Missing Values  Percentage
32 Engine_Horsepower                 375906   93.712932
15 Pushblock                         375906   93.712932
3  Enclosure_Type                    375906   93.712932
17 Blade_Width                       375906   93.712932
[...]

数字对应于原始dataframe的列号，预先分类。

仅在单独研究这些列可以删除这些列后，我将根据阈值删除列（50％ null值，删除）。

以保持列的整数位置，使列Multiiindex

df.columns = pd.MultiIndex.from_arrays([range(len(df.columns)), df.columns])

，然后滤波的过滤和摘要将保持位置

threshold = .4
df[df.columns[df.isnull().mean() < threshold]]

这将执行您功能的第一部分

df_null_summary = pd.concat([df.isnull().sum(), df.isnull().mean()], axis=1, keys=['Missing Values', 'Percentage'])

索引未在数据框架中显示 - 需要显示相应的索引，然后根据使用PANDA的阈值删除列

相关内容

最新更新

热门标签：