我有一个多索引 df,我只想提取包含空值的行和列,或者如果可能的话,提取偏移量 (?( 位置:
week_1 week_2 week_3 week_4 week_5 week_6
Year
2000 Arizona Cardinals loser winner loser loser winner loser
Atlanta Falcons winner loser winner loser loser loser
Baltimore Ravens winner NaN loser winner winner winner
Buffalo Bills NaN winner loser loser loser winner
Carolina Panthers loser winner loser loser winner loser
所以理想的输出是:
#the entire index and column location
(2000, Baltimore Ravens , Week_2)
或者如果不可能,则只包含 Nan 值的行
week_1 week_2 week_3 week_4 week_5 week_6
Year
2000
Baltimore Ravens winner NaN loser winner winner winner
Buffalo Bills NaN winner loser loser loser winner
我尝试了类似的东西:
idx = pd.IndexSlice
x =df.loc[idx[:, :], idx['week_1':'week_16']].isnull()
然后是 df[x] 或 df.loc[x],但我得到一个只有 NaN 值的数据帧。
week_1 week_2 week_3 week_4 week_5 week_6 week_7
Year
2000 Arizona Cardinals NaN NaN NaN NaN NaN NaN NaN
Atlanta Falcons NaN NaN NaN NaN NaN NaN NaN
Baltimore Ravens NaN NaN NaN NaN NaN NaN NaN
Buffalo Bills NaN NaN NaN NaN NaN NaN NaN
Carolina Panthers NaN NaN NaN NaN NaN NaN NaN
假设你在熊猫0.25或更高版本上,所以你可以使用explode
:
s = df.apply(lambda row: row[row.isna()].index, axis=1)
.explode()
.dropna()
结果:
Year Team
2000 Baltimore Ravens week_2
Buffalo Bills week_1
它的作用:
apply
循环访问每一行并获取na
列的名称。这将返回一个可能为空的列表,因为一行可以有 0 到许多na
列。explode
将嵌入在每行中的列列表转换为它们自己的行,并根据需要重复索引。dropna
删除没有na
列的行。