正在数据帧中获取NaN值,但不确定原因



我正在尝试创建一个从github url中提取的数据帧。然后,数据帧将github文件中的Age列排序为新的数据帧,其中Age_12列的Age值在(1-12(之间,Age_TEEN列的Age值在(13-19(之间。但是,当我将表示AGE_12和AGE_TEEN值的数据分配给它们在新数据帧中的列时,我最终会得到它们的NaN值?我试着切换列的位置,AGE_12有时会产生正确的值,但另一个不会,反之亦然。

这是我的代码:

#Reads url for Github
url = 'https://raw.githubusercontent.com/wesm/pydata-book/2nd-edition/datasets/titanic/train.csv'
#Creates dataframe from Raw Github Link 
data = pd.read_csv(url, error_bad_lines=False)

AGE_12 = data[data['Age'].between(1,12)]
AGE_TEEN = data[data['Age'].between(13,19)]

pasUpto19 = pd.DataFrame()
pasUpto19 = pasUpto19.assign(PCLASS=data['Pclass'],AGE_12=AGE_12['Age'],AGE_TEEN=AGE_TEEN['Age'])
print(pasUpto19)

它输出这个:

PCLASS  AGE_12  AGE_TEEN
0         3     NaN       NaN
1         1     NaN       NaN
2         3     NaN       NaN
3         1     NaN       NaN
4         3     NaN       NaN
..      ...     ...       ...
886       2     NaN       NaN
887       1     NaN      19.0
888       3     NaN       NaN
889       1     NaN       NaN
890       3     NaN       NaN

如果我做了一些愚蠢的事情,请提前道歉,我对python和使用pandas 非常陌生

pasUpto19 = pasUpto19.dropna(axis=0, how='all')将从新数据帧中删除所有nan值。

相关内容

最新更新