movie_rating_T.iloc[:,5:6]
critic Toby
title
Just My Luck NaN
Lady in the Water NaN
Snakes on a Plane 4.5
Superman Returns 4.0
The Night Listener NaN
You Me and Dupree 1.0
假设我只想
选择Nan列
Just My Luck
Lady in the Water
The Night Listener
如何使用数据帧 nan 只提取 nan 值?
critic Toby
title
Just My Luck NaN
Lady in the Water NaN
The Night Listener NaN
.["标题"] 不起作用
========================================================================movie_rating_T.iloc[:,5:6]
critic Toby
title
Just My Luck NaN
Lady in the Water NaN
Snakes on a Plane 4.5
Superman Returns 4.0
The Night Listener NaN
You Me and Dupree 1.0
df_MovieRatingT[df_MovieRatingT['Toby'].isnull()]
critic Toby
title
Just My Luck NaN
Lady in the Water NaN
The Night Listener NaN
========================================================================df = 数据帧(评级)
critic title rating
0 Jack Matthews Lady in the Water 3.0
1 Jack Matthews Snakes on a Plane 4.0
2 Jack Matthews You Me and Dupree 3.5
3 Jack Matthews Superman Returns 5.0
我想成功
critic Claudia Puig Gene Seymour Jack Matthews Lisa Rose Mick LaSalle Toby
title
Just My Luck 3.0 1.5 NaN 3.0 2.0 NaN
Lady in the Water NaN 3.0 3.0 2.5 3.0 NaN
Snakes on a Plane 3.5 3.5 4.0 3.5 4.0 4.5
Superman Returns 4.0 5.0 5.0 3.5 3.0 4.0
The Night Listener 4.5 3.0 3.0 3.0 3.0 NaN
You Me and Dupree 2.5 3.5 3.5 2.5 2.0 1.0
我用了
movie_rating= ratings.pivot(index='critic', columns='title',values='rating')
但它在同一专栏上创建了标题和评论家。如何解决?
您可以使用 isnull 使用熊猫
df[df['You column with NaN'].isnull()]
这将返回带有 NaN 的行
df2 = df[df['You column with NaN'].isnull()]['Title']
会返回你想要的,
举个例子:
import pandas as pd
import numpy as np
df = pd.DataFrame([range(3), [0, np.NaN, np.NaN], [0, 0, np.NaN], range(3), range(3)], columns=["Col_1", "Col_2", "Col_3"])
print df
Col_1 Col_2 Col_3
0 0 1.0 2.0
1 0 NaN NaN
2 0 0.0 NaN
3 0 1.0 2.0
4 0 1.0 2.0
print df[df['Col_3'].isnull()]
Col_1 Col_2 Col_3
1 0 NaN NaN
2 0 0.0 NaN
df2 =df[df['Col_3'].isnull()]['Col_2']
print df2
1 NaN
2 0.0
Name: Col_2, dtype: float64
编辑
我现在遇到了你的问题,主要问题是数据帧本身。使用透视时,列参数是错误的...
不过,您不需要解决此问题。
如果我没记错的话,现在你只需要评论家和电影,而不需要评级本身。
df_Toby = df.loc[df['critic'] == 'Toby']
此 df['critic'] == 'Toby' 将选择所有带有评论家名称的行
要返回标题,您可以选择"标题"列
df_Toby = df_Toby['title']
对标题和评级进行子集化
df_Toby = df_Toby[['title', 'rating']]
之后你可以使用
exclude_Nan_df_Toby = df_Toby.dropna()
这将排除所有具有 NaN 的行,并仅返回具有有效评级的行。
干杯