对多索引 Pandas 数据帧进行布尔索引,以仅对具有空值的行进行子集化



我有一个多索引 df,我只想提取包含空值的行和列,或者如果可能的话,提取偏移量 (?( 位置:

week_1  week_2  week_3  week_4  week_5  week_6  
Year                                                                     
2000 Arizona Cardinals   loser  winner   loser   loser  winner   loser   
Atlanta Falcons     winner  loser  winner   loser   loser   loser   
Baltimore Ravens    winner  NaN   loser  winner  winner  winner   
Buffalo Bills       NaN     winner   loser   loser   loser  winner   
Carolina Panthers   loser  winner   loser   loser  winner   loser 

所以理想的输出是:

#the entire index and column location
(2000, Baltimore Ravens , Week_2) 

或者如果不可能,则只包含 Nan 值的行

week_1  week_2  week_3  week_4  week_5  week_6  
Year                                                                     
2000    
Baltimore Ravens    winner  NaN   loser  winner  winner  winner   
Buffalo Bills       NaN     winner loser   loser   loser  winner 

我尝试了类似的东西:

idx = pd.IndexSlice
x =df.loc[idx[:, :], idx['week_1':'week_16']].isnull()

然后是 df[x] 或 df.loc[x],但我得到一个只有 NaN 值的数据帧。

week_1 week_2 week_3 week_4 week_5 week_6 week_7  
Year                                                                      
2000 Arizona Cardinals    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
Atlanta Falcons      NaN    NaN    NaN    NaN    NaN    NaN    NaN   
Baltimore Ravens     NaN    NaN    NaN    NaN    NaN    NaN    NaN   
Buffalo Bills        NaN    NaN    NaN    NaN    NaN    NaN    NaN   
Carolina Panthers    NaN    NaN    NaN    NaN    NaN    NaN    NaN 

假设你在熊猫0.25或更高版本上,所以你可以使用explode

s = df.apply(lambda row: row[row.isna()].index, axis=1) 
.explode() 
.dropna()

结果:

Year  Team            
2000  Baltimore Ravens    week_2
Buffalo Bills       week_1

它的作用:

  • apply循环访问每一行并获取na列的名称。这将返回一个可能为空的列表,因为一行可以有 0 到许多na列。
  • explode将嵌入在每行中的列列表转换为它们自己的行,并根据需要重复索引。
  • dropna删除没有na列的行。

最新更新