我正在研究不平衡数据集的二元分类。数据集包含 777 个少数类和 2223 个多数类。我构建了一个单类 SVM 模型,其中只有少数标记的记录。BUt 当我尝试在构建的模型上进行预测时,我得到的预测值均为 -1,因此准确性为 0.我已经缩放了我的特征。这是我的实现
ml_file_df = pd.read_csv('/data/jayashree/3000_ML_features.csv')
minority_df = ml_file_df[ml_file_df['RESULT'] == 0]
array = minority_df.values
features = array[:, 0:60630]
labels = array[:, 60630]
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(features)
features_train, features_test, labels_train, labels_test = train_test_split(
scaled_features, labels, test_size=0.3, random_state=10)
gamma_values = [0.001, 0.005, 0.01, 0.05, 0.1, 0.5]
nu_values = [0.1, 0.3, 0.5, 0.7]
for j in nu_values:
for i in gamma_values:
clf = svm.OneClassSVM(nu=j, kernel='rbf', gamma=i)
clf.fit(features_train, labels_train)
pred = clf.predict(features_test)
print(i, classification_report(labels_test, pred))
对于所有情况,我都会得到这样的预测
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 1 -1 -1 -1 -1 1]
我哪里出错了?
我已经解决了错误。一个类 svm 将预测 1 或 -1。我使用少数类为 1,将多数类用作 -1。这解决了我的问题。
那是因为你有0
和1
标签和一类SVM显示-1
和1
,你交换-1
就足够了0
。