仅从数据框中选择“NAN 列”



movie_rating_T.iloc[:,5:6]

critic  Toby
title   
Just My Luck    NaN
Lady in the Water   NaN
Snakes on a Plane   4.5
Superman Returns    4.0
The Night Listener  NaN
You Me and Dupree   1.0

假设我只想
选择Nan列

Just My Luck
Lady in the Water
The Night Listener

如何使用数据帧 nan 只提取 nan 值?

critic  Toby
title   
Just My Luck    NaN
Lady in the Water   NaN
The Night Listener  NaN

.["标题"] 不起作用

========================================================================

movie_rating_T.iloc[:,5:6]

critic  Toby
title   
Just My Luck    NaN
Lady in the Water   NaN
Snakes on a Plane   4.5
Superman Returns    4.0
The Night Listener  NaN
You Me and Dupree   1.0

df_MovieRatingT[df_MovieRatingT['Toby'].isnull()]

critic  Toby
title   
Just My Luck    NaN
Lady in the Water   NaN
The Night Listener  NaN
========================================================================

df = 数据帧(评级)

    critic  title   rating
0   Jack Matthews   Lady in the Water   3.0
1   Jack Matthews   Snakes on a Plane   4.0
2   Jack Matthews   You Me and Dupree   3.5
3   Jack Matthews   Superman Returns    5.0

我想成功

critic  Claudia Puig    Gene Seymour    Jack Matthews   Lisa Rose   Mick LaSalle    Toby
title                       
Just My Luck    3.0 1.5 NaN 3.0 2.0 NaN
Lady in the Water   NaN 3.0 3.0 2.5 3.0 NaN
Snakes on a Plane   3.5 3.5 4.0 3.5 4.0 4.5
Superman Returns    4.0 5.0 5.0 3.5 3.0 4.0
The Night Listener  4.5 3.0 3.0 3.0 3.0 NaN
You Me and Dupree   2.5 3.5 3.5 2.5 2.0 1.0

我用了

movie_rating= ratings.pivot(index='critic', columns='title',values='rating')

但它在同一专栏上创建了标题和评论家。如何解决?

您可以使用 isnull 使用熊猫

df[df['You column with NaN'].isnull()]

这将返回带有 NaN 的行

df2 = df[df['You column with NaN'].isnull()]['Title']

会返回你想要的,

举个例子:

import pandas as pd
import numpy as np
df = pd.DataFrame([range(3), [0, np.NaN, np.NaN], [0, 0, np.NaN], range(3), range(3)], columns=["Col_1", "Col_2", "Col_3"])
print df
   Col_1  Col_2  Col_3
0     0   1.0   2.0
1     0   NaN   NaN
2     0   0.0   NaN
3     0   1.0   2.0
4     0   1.0   2.0
print df[df['Col_3'].isnull()]
   Col_1  Col_2  Col_3
1     0   NaN   NaN
2     0   0.0   NaN
df2 =df[df['Col_3'].isnull()]['Col_2']
print df2
1    NaN
2    0.0
Name: Col_2, dtype: float64

编辑

我现在遇到了你的问题,主要问题是数据帧本身。使用透视时,列参数是错误的...

不过,您不需要解决此问题。

如果我没记错的话,现在你只需要评论家和电影,而不需要评级本身。

df_Toby = df.loc[df['critic'] == 'Toby']

此 df['critic'] == 'Toby' 将选择所有带有评论家名称的行

要返回标题,您可以选择"标题"列

df_Toby = df_Toby['title']

对标题和评级进行子集化

df_Toby = df_Toby[['title', 'rating']]

之后你可以使用

exclude_Nan_df_Toby = df_Toby.dropna()

这将排除所有具有 NaN 的行,并仅返回具有有效评级的行。

干杯

最新更新