无法在weka java * weka *DUMMY*STRING*FOR*STRING*ATTRIBUTES*中获取类



我正在尝试使用weka库和在线教程在java中分类一个实例。

我已经在我的设备中建立了一个模型,并使用此代码从磁盘加载该模型。

public void makeModel() throws Exception
    {
        ArffLoader loader = new ArffLoader();
    loader.setFile(new File("data.arff"));
   Instances structure = loader.getDataSet();
    structure.setClassIndex(1);
// train NaiveBayes
NaiveBayesMultinomial n = new NaiveBayesMultinomial();
FilteredClassifier f = new FilteredClassifier();
StringToWordVector s = new StringToWordVector();
s.setUseStoplist(true);
s.setWordsToKeep(100);
f.setFilter(s);
f.setClassifier(n);
structure.numAttributes();
 f.buildClassifier(structure);
Instance current;

Evaluation eval = new Evaluation(structure);
 eval.crossValidateModel(f, structure, 10, new Random(1));
 System.out.println(eval.toSummaryString("nResultsn======n", false));

// output generated model
//System.out.println(f);
 ObjectOutputStream oos = new ObjectOutputStream(
                            new FileOutputStream("classifier.model"));
 oos.writeObject(f);
 oos.flush();
 oos.close();
    }

------------------------ 输出 -------------

<标题> 结果

正确分类实例20158 79.6948%错误分类实例5136 20.3052%Kappa统计量0.6737平均绝对误差0.0726均方根误差0.2025相对绝对误差38.7564%根相对平方误差66.1815%病例覆盖率(0.95水平)96.4142%平均rel.区域大小(0.95水平)27.7531%实例总数25294


然后我使用相同的模型对未标记的实例进行分类。

public void classify() throws Exception
    {
        FilteredClassifier cls = (FilteredClassifier) weka.core.SerializationHelper.read("classifier.model");

Instances unlabeled = new Instances(
                         new BufferedReader(
                           new FileReader("test.arff")));
 // set class attribute
 unlabeled.setClassIndex(0);
 // create copy
 Instances labeled = new Instances(unlabeled);
 // label instances
 for (int i = 0; i < unlabeled.numInstances(); i++) {
     System.out.println(labeled.instance(i).classValue());
     System.out.print(", actual: " + labeled.classAttribute().value((int)labeled.instance(i).classValue()));
   double clsLabel = cls.classifyInstance(unlabeled.instance(i));
   labeled.instance(i).setClassValue(clsLabel);
   System.out.println(", predicted: " + labeled.classAttribute().value((int) clsLabel));
 }
 // save labeled data
System.out.println("ended");

    }

------------------------ 输出 ---------------------------

1.0Bud1?是一个新的新的字符串。txtilocblob R(??????@吗? @ ? @ ? @  E ?DSDB ' @? @ ? @,预测:*WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES*2.0这是一个新的字符串*WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES*结束了


然而,我的错误是预测实际上是*WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES*当它应该给我一个类标签时。

在保存分类器的同时也保存实例(只是头部,不需要数据):

Instances instancesSample = new Instances(structure, 0);
instancesSample.setClassIndex(1);
...
ObjectOutputStream oos = new ObjectOutputStream(
                        new FileOutputStream("classifier.model"));
oos.writeObject(f);
oos.writeObject(instancesSample);
oos.flush();
oos.close();

加载模型后,将已保存的Instances加载为instancesSample。而分类:

ObjectInputStream objectInputStream = new ObjectInputStream(new BufferedInputStream(new FileInputStream("classifier.model")));
FilteredClassifier cls = (FilteredClassifier)= (Classifier) objectInputStream.readObject();
Instances instancesSample = (Instances) objectInputStream.readObject();
objectInputStream.close();
int classIndex = 1;
Instances ins = unlabeled[i];
double clsLabel = cls.classifyInstance(ins);
String prediction = instancesSample.attribute(classIndex).value((int) clsLabel));
System.out.println(", predicted: " + prediction);

我已将这些行添加到我的分类方法中。

ArffLoader loader = new ArffLoader();
    loader.setFile(new File("data.arff"));
   Instances structure = loader.getDataSet();
    structure.setClassIndex(1);

为了获得类标签,我将其更改为如下

System.out.println(", predicted: " + structure.classAttribute().value((int) clsLabel));

最新更新