创建数据帧时出现意外的IndexError



我正在尝试执行以下代码:

heart_df = pd.read_csv(r"location")
X = heart_df.iloc[:, :-1].values
y = heart_df.iloc[:, 11].values
new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]].values() #this is line 17
cat_cols = new_df.copy()

和获取IndexError类似:

  File "***location***", line 17, in <module>
  new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]].values()
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

据我所知,当我们使用浮点数作为索引时,就会出现IndexError,但不明白为什么会出现这种情况。

在这里,通过创建new_df和cat_cols,我希望在稍后阶段分离分类列以应用OneHotEncoding。

数据集在这里:https://www.kaggle.com/fedesoriano/heart-failure-prediction.

错误来自:

X = heart_df.iloc[:, :-1].values

.values部分将数据帧转换为numpy数组,X中的某些列与numpy数组不兼容。

所以我们可以写相同的:

X = heart_df.iloc[:, :-1]
new_df = X[["Sex", "ChestPainType", "RestingECG", "ExerciseAngina", "ST_Slope"]]

最新更新