将list-like传递给.loc或带有任何缺失标签的[]将来会引发KeyError，您可以使用.reindex()作为

我试图将我的数据集分成训练集和测试集，使用:

for train_set, test_set in stratified.split(complete_df, complete_df["loan_condition_int"]):
stratified_train = complete_df.loc[train_set]
stratified_test = complete_df.loc[test_set]

我的数据框complete_df没有任何NaN值。我用返回0的complete_df.isnull().sum().max()来确定。

但是我仍然得到一个警告说:

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

之后会导致错误。我试着使用一些我在网上找到的技术，但它仍然不能解决它。

首先，您应该澄清什么是stratified。我假设这是一个sklearn的StratifiedShuffleSplit对象。

我的数据集complete_df没有任何NAN值。

"错过labels"从警告消息中不引用缺失值，即nan。错误是说train_set和/或test_set包含complete_df索引中不存在的值(标签)。这是因为.loc基于行(和列)标签执行索引，而不是基于行位置，而train_set和test_set表示行号。因此，如果您的DataFrame的索引与行的整数位置不一致，似乎就是这种情况，则会引发警告。

按行位置选择，使用iloc。这应该可以工作

for train_set, test_set in stratified.split(complete_df, complete_df["loan_condition_int"]):
stratified_train = complete_df.iloc[train_set]
stratified_test = complete_df.iloc[test_set]

相关内容

最新更新

热门标签：