WEKA IBk EditDistance(Levenstein距离)的错误结果-JAVA



我对WEKA非常陌生,今天尝试了一种IBk算法,通过距离函数Levenstein distance将字符串分类到不同的类。然而,我得到了非常糟糕的结果。我的输入总是被分配给同一个类(类b(,这根本不正确。有人能告诉我我做错了什么吗?

目前我的代码很简单:

CSVLoader loader = new CSVLoader();
loader.setSource(new File("current_path"));
Instances data = loader.getDataSet();
int numberAttributes = data.numAttributes();
data.setClassIndex(data.numAttributes() - 1);
EditDistance newWeka = new EditDistance();
IBk ibk = new IBk(1);
((IBk) ibk).getNearestNeighbourSearchAlgorithm().setDistanceFunction(newWeka); 
ibk.setCrossValidate(false);
ibk.setMeanSquared(false);
ibk.buildClassifier(data);
System.out.println(ibk);

Evaluation eval = new Evaluation(data);
eval.evaluateModel(ibk, data);

结果:

** KNN Demo  **
Correctly Classified Instances           4               50      %
Incorrectly Classified Instances         4               50      %
Kappa statistic                          0     
Mean absolute error                      0.398 
Root mean squared error                  0.4449
Relative absolute error                 97.2913 %
Root relative squared error             99.5586 %
Total Number of Instances                8     
=== Detailed Accuracy By Class ===
TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
0,000    0,000    ?          0,000    ?          ?        0,500     0,375     Surname
1,000    1,000    0,500      1,000    0,667      ?        0,500     0,500     Firstname
0,000    0,000    ?          0,000    ?          ?        0,500     0,125     Job
Weighted Avg.    0,500    0,500    ?          0,500    ?          ?        0,500     0,406     
=== Confusion Matrix ===
a b c   <-- classified as
0 3 0 | a = Surname
0 4 0 | b = Firstname
0 1 0 | c = Job

文件:

"Attribute","class"
"Wellbrock","Surname"
"Kohler","Surname"
"Sanger","Surname"
"Jan","Firstname"
"Anna","Firstname"
"Tim","Firstname"
"Schmidt","Firstname"
"Consultant","Job"

非常感谢您的帮助

我自己找到了解决方案。问题是,对于JAVA API,标准搜索算法似乎是Zero-R,它总是将所有属性分类到最现有的类。

我在代码中添加了这一行,现在结果如预期:ibk.setNearestNeighbourSearchAlgorithm(new LinearNNSearch(((;

=== Confusion Matrix ===
a b c   <-- classified as
3 0 0 | a = Surname
0 4 0 | b = Firstname
0 0 6 | c = Job

最新更新