在SVM中使用带卡方距离的RBF内核



如何实现上述标题任务。我们是否有RBF内核中的任何参数将距离度量设置为卡方距离度量。我可以在Sk-Learn库中看到Chi2_kernel。

以下是我编写的代码。

import numpy as np
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix
from sklearn.preprocessing import Imputer
from numpy import genfromtxt
from sklearn.metrics.pairwise import chi2_kernel

file_csv = 'dermatology.data.csv'
dataset = genfromtxt(file_csv, delimiter=',')
imp = Imputer(missing_values='NaN', strategy='most_frequent', axis=1)
dataset = imp.fit_transform(dataset)
target = dataset[:, [34]].flatten()
data = dataset[:, range(0,34)]
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.3)
# TODO : willing to set chi-squared distance metric instead. How to do that ?
clf = svm.SVC(kernel='rbf', C=1)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(f1_score(y_test, y_pred, average="macro"))
print(precision_score(y_test, y_pred, average="macro"))
print(recall_score(y_test, y_pred, average="macro"))

您确定要 compose rbf和chi2吗?CHI2本身定义了有效的内核,而您要做的就是

clf = svm.SVC(kernel=chi2_kernel, C=1)

由于Sklearn接受函数作为内核(但这需要O(n^2)和时间)。如果您想撰写这两个,那就更为复杂了,您将必须实现自己的内核才能做到这一点。对于更多的控制(和其他内核),您也可能会尝试使用Pykernels,但是尚无支持。

最新更新