我试图将幼稚的高斯人适合数据集。以下是代码:
import pandas as pd
import numpy as np
from sklearn.naive_bayes import GaussianNB
df = pd.read_csv('train_data.csv')
X = df.iloc[:,0:23]
X
Y = df.iloc[:,24:25]
clf = GaussianNB()
clf.fit(X, Y)
这就是数据的样子
LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 PAY_3 PAY_4 PAY_5 ... BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 default_next_month
0 20000 2 2 1 24 2 2 -1 -1 -2 ... 0 0 0 0 689 0 0 0 0 1
1 120000 2 2 2 26 -1 2 0 0 0 ... 3272 3455 3261 0 1000 1000 1000 0 2000 1
2 90000 2 2 2 34 0 0 0 0 0 ... 14331 14948 15549 1518 1500 1000 1000 1000 5000 0
3 50000 2 2 1 37 0 0 0 0 0 ... 28314 28959 29547 2000 2019 1200 1100 1069 1000 0
4 50000 1 2 1 57 -1 0 -1 0 0 ... 20940 19146 19131 2000 36681 10000 9000 689 679
default_next_month是目标变量。这是二进制分类问题。y包含最后一列。但是它给出了这个错误:
ValueError Traceback (most recent call last)
<ipython-input-24-d9885fbe19e4> in <module>()
3
4 clf = GaussianNB()
----> 5 clf.fit(X, Y)
6
7
/home/fatima/anaconda2/lib/python2.7/site-packages/sklearn/naive_bayes.pyc in fit(self, X, y, sample_weight)
180 Returns self.
181 """
--> 182 X, y = check_X_y(X, y)
183 return self._partial_fit(X, y, np.unique(y), _refit=True,
184 sample_weight=sample_weight)
/home/fatima/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
524 dtype=None)
525 else:
--> 526 y = column_or_1d(y, warn=True)
527 _assert_all_finite(y)
528 if y_numeric and y.dtype.kind == 'O':
/home/fatima/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in column_or_1d(y, warn)
560 return np.ravel(y)
561
--> 562 raise ValueError("bad input shape {0}".format(shape))
563
564
ValueError: bad input shape (25000, 0)
我要做的就是将语句更改为
Y = df.iloc[:,-1]
它起作用