ValueError:不良输入形状(25000,0)



我试图将幼稚的高斯人适合数据集。以下是代码:

import pandas as pd
import numpy as np
from sklearn.naive_bayes import GaussianNB
df = pd.read_csv('train_data.csv')
X = df.iloc[:,0:23]
X
Y = df.iloc[:,24:25]

clf = GaussianNB()
clf.fit(X, Y)

这就是数据的样子

    LIMIT_BAL   SEX     EDUCATION   MARRIAGE    AGE     PAY_0   PAY_2   PAY_3   PAY_4   PAY_5   ...     BILL_AMT4   BILL_AMT5   BILL_AMT6   PAY_AMT1    PAY_AMT2    PAY_AMT3    PAY_AMT4    PAY_AMT5    PAY_AMT6    default_next_month
0   20000   2   2   1   24  2   2   -1  -1  -2  ...     0   0   0   0   689     0   0   0   0   1
1   120000  2   2   2   26  -1  2   0   0   0   ...     3272    3455    3261    0   1000    1000    1000    0   2000    1
2   90000   2   2   2   34  0   0   0   0   0   ...     14331   14948   15549   1518    1500    1000    1000    1000    5000    0
3   50000   2   2   1   37  0   0   0   0   0   ...     28314   28959   29547   2000    2019    1200    1100    1069    1000    0
4   50000   1   2   1   57  -1  0   -1  0   0   ...     20940   19146   19131   2000    36681   10000   9000    689     679     

default_next_month是目标变量。这是二进制分类问题。y包含最后一列。但是它给出了这个错误:

ValueError                                Traceback (most recent call last)
<ipython-input-24-d9885fbe19e4> in <module>()
      3 
      4 clf = GaussianNB()
----> 5 clf.fit(X, Y)
      6 
      7 
/home/fatima/anaconda2/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in fit(self, X, y, sample_weight)
    180             Returns self.
    181         """
--> 182         X, y = check_X_y(X, y)
    183         return self._partial_fit(X, y, np.unique(y), _refit=True,
    184                                  sample_weight=sample_weight)
/home/fatima/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    524                         dtype=None)
    525     else:
--> 526         y = column_or_1d(y, warn=True)
    527         _assert_all_finite(y)
    528     if y_numeric and y.dtype.kind == 'O':
/home/fatima/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in column_or_1d(y, warn)
    560         return np.ravel(y)
    561 
--> 562     raise ValueError("bad input shape {0}".format(shape))
    563 
    564 
ValueError: bad input shape (25000, 0)

我要做的就是将语句更改为

Y = df.iloc[:,-1]

它起作用

相关内容

  • 没有找到相关文章

最新更新