Pandas初学者在这里,对于一个项目,我得到了这样的数据帧:
import numpy as np
import pandas as pd
data = [[np.nan, np.nan, np.nan, np.nan, "x", np.nan],
[np.nan, np.nan, np.nan, np.nan, "y", np.nan],
[np.nan, np.nan, np.nan, np.nan, "z", np.nan]]
cols = [("column1", "first"), ("column1", "last"), ("column2", "first"), ("column2", "last"), ("column3", "first"), ("column3", "last")]
df = pd.Dataframe(data, columns=cols)
我正在获取列对,并希望删除所有列,其中列对的两个部分的所有元素都是NaN。对于给定的代码,我期望最后一列对。预期结果:
column3/first | column3/last |
---|---|
x | nan |
y | nan |
z | nan |
首先需要列中的MultiIndex
:
df = pd.DataFrame(data, columns=pd.MultiIndex.from_tuples(cols))
#if nott create it
#df = pd.Dataframe(data, columns=cols)
#df.columns = pd.MultiIndex.from_tuples(df.columns)
对于每个一级值,测试GroupBy.transform
是否遗漏了所有值,然后测试DataFrame.any
是否每列至少有一个True
,~
和|
是否对位OR
进行链反转掩码,以及DataFrame.loc
:中的滤波器
m1 = ~df.isna().groupby(level=0, axis=1).transform('all').any()
m2 = ~df.isna().any()
df = df.loc[:, m1 | m2]
print(df)
column3
first last
0 x NaN
1 y NaN
2 z NaN