Pandas Dataframe:如何在列对中删除带有NaN的列



Pandas初学者在这里,对于一个项目,我得到了这样的数据帧:

import numpy as np
import pandas as pd
data = [[np.nan, np.nan, np.nan, np.nan, "x", np.nan],
[np.nan, np.nan, np.nan, np.nan, "y", np.nan],
[np.nan, np.nan, np.nan, np.nan, "z", np.nan]]
cols = [("column1", "first"), ("column1", "last"), ("column2", "first"), ("column2", "last"), ("column3", "first"), ("column3", "last")]
df = pd.Dataframe(data, columns=cols)

我正在获取列对,并希望删除所有列,其中列对的两个部分的所有元素都是NaN。对于给定的代码,我期望最后一列对。预期结果:

column3/firstcolumn3/last
xnan
ynan
znan

首先需要列中的MultiIndex

df = pd.DataFrame(data, columns=pd.MultiIndex.from_tuples(cols))
#if nott create it
#df = pd.Dataframe(data, columns=cols)
#df.columns = pd.MultiIndex.from_tuples(df.columns)

对于每个一级值,测试GroupBy.transform是否遗漏了所有值,然后测试DataFrame.any是否每列至少有一个True~|是否对位OR进行链反转掩码,以及DataFrame.loc:中的滤波器

m1 = ~df.isna().groupby(level=0, axis=1).transform('all').any()
m2 = ~df.isna().any()
df = df.loc[:, m1 | m2]
print(df)
column3     
first last
0       x  NaN
1       y  NaN
2       z  NaN

最新更新