Numpy 一次全部求和会得到 NaN,但单独求和不会



我有一些数据 - 所有这些都是非负的。Numpy说它的总和是nan,但我不相信它是。这是我的解释:

首先,我在训练数据中读到:

dataframe = pandas.read_csv( "buggy.csv" )
training = dataframe.ix[:,dataframe.columns != "Survived"].values.astype( np.float32 )

训练特征存储在 numpy 数组中。我将前 61 行相加并将其添加到第 62 行的总和中:

sum1 = training[0:61][:].sum()
sum2 = training[62][:].sum()
print sum1 + sum2

我得到以下输出:5788.54

我将前 62 行相加:

print training[0:62][:].sum()

我得到以下输出:nan

为什么第二个求和会得到nan?我的所有数据都是非负的,所以我认为数字的顺序并不重要。提前感谢您的帮助。

(另外,这是来自anaconda 4.0.4的python 2.7)


以下是完整的代码:

import numpy as np
import pandas
dataframe = pandas.read_csv( "buggy.csv" )
training = dataframe.ix[:,dataframe.columns != "Survived"].values.astype( np.float32 )
labels = dataframe[ "Survived" ].values.astype( np.float32 )

sum1 = training[0:61][:].sum()
sum2 = training[62][:].sum()
print sum1 + sum2
print training[0:62][:].sum()

以下是重现问题所需的最少数据(只需将其复制粘贴到名为"buggy.csv"的文件中):

,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,0,22.0,1,0,7.25,2.0
1,1,1,1,38.0,1,0,71.2833,0.0
2,1,3,1,26.0,0,0,7.925,2.0
3,1,1,1,35.0,1,0,53.1,2.0
4,0,3,0,35.0,0,0,8.05,2.0
5,0,3,0,29.6991176471,0,0,8.4583,1.0
6,0,1,0,54.0,0,0,51.8625,2.0
7,0,3,0,2.0,3,1,21.075,2.0
8,1,3,1,27.0,0,2,11.1333,2.0
9,1,2,1,14.0,1,0,30.0708,0.0
10,1,3,1,4.0,1,1,16.7,2.0
11,1,1,1,58.0,0,0,26.55,2.0
12,0,3,0,20.0,0,0,8.05,2.0
13,0,3,0,39.0,1,5,31.275,2.0
14,0,3,1,14.0,0,0,7.8542,2.0
15,1,2,1,55.0,0,0,16.0,2.0
16,0,3,0,2.0,4,1,29.125,1.0
17,1,2,0,29.6991176471,0,0,13.0,2.0
18,0,3,1,31.0,1,0,18.0,2.0
19,1,3,1,29.6991176471,0,0,7.225,0.0
20,0,2,0,35.0,0,0,26.0,2.0
21,1,2,0,34.0,0,0,13.0,2.0
22,1,3,1,15.0,0,0,8.0292,1.0
23,1,1,0,28.0,0,0,35.5,2.0
24,0,3,1,8.0,3,1,21.075,2.0
25,1,3,1,38.0,1,5,31.3875,2.0
26,0,3,0,29.6991176471,0,0,7.225,0.0
27,0,1,0,19.0,3,2,263.0,2.0
28,1,3,1,29.6991176471,0,0,7.8792,1.0
29,0,3,0,29.6991176471,0,0,7.8958,2.0
30,0,1,0,40.0,0,0,27.7208,0.0
31,1,1,1,29.6991176471,1,0,146.5208,0.0
32,1,3,1,29.6991176471,0,0,7.75,1.0
33,0,2,0,66.0,0,0,10.5,2.0
34,0,1,0,28.0,1,0,82.1708,0.0
35,0,1,0,42.0,1,0,52.0,2.0
36,1,3,0,29.6991176471,0,0,7.2292,0.0
37,0,3,0,21.0,0,0,8.05,2.0
38,0,3,1,18.0,2,0,18.0,2.0
39,1,3,1,14.0,1,0,11.2417,0.0
40,0,3,1,40.0,1,0,9.475,2.0
41,0,2,1,27.0,1,0,21.0,2.0
42,0,3,0,29.6991176471,0,0,7.8958,0.0
43,1,2,1,3.0,1,2,41.5792,0.0
44,1,3,1,19.0,0,0,7.8792,1.0
45,0,3,0,29.6991176471,0,0,8.05,2.0
46,0,3,0,29.6991176471,1,0,15.5,1.0
47,1,3,1,29.6991176471,0,0,7.75,1.0
48,0,3,0,29.6991176471,2,0,21.6792,0.0
49,0,3,1,18.0,1,0,17.8,2.0
50,0,3,0,7.0,4,1,39.6875,2.0
51,0,3,0,21.0,0,0,7.8,2.0
52,1,1,1,49.0,1,0,76.7292,0.0
53,1,2,1,29.0,1,0,26.0,2.0
54,0,1,0,65.0,0,1,61.9792,0.0
55,1,1,0,29.6991176471,0,0,35.5,2.0
56,1,2,1,21.0,0,0,10.5,2.0
57,0,3,0,28.5,0,0,7.2292,0.0
58,1,2,1,5.0,1,2,27.75,2.0
59,0,3,0,11.0,5,2,46.9,2.0
60,0,3,0,22.0,0,0,7.2292,0.0
61,1,1,1,38.0,0,0,80.0,
62,0,1,0,45.0,1,0,83.475,2.0

您跳过第 61 行,这是有问题的。 training[0:61][:].sum()不包括第 61 行。

training[61]
Out[10]: array([ 61.,   1.,   1.,  38.,   0.,   0.,  80.,  nan], dtype=float32)

最后一列丢失,它只有 7 个值。

最新更新