类的WEKA分类可能性

我想知道WEKA中是否有办法为分类输出一些"最佳猜测"。

我的场景是：例如，我使用交叉验证对数据进行分类，然后在 weka 的输出上得到类似的东西：这是该实例分类的 3 个最佳猜测。我想要的是，即使一个实例没有正确分类，我也会得到该实例的 3 或 5 个最佳猜测的输出。

例：

课程： A，B，C，D，E实例：1...10

输出将是：实例 1 90% 可能是 A 类，75% 可能是 B 类，60% 可能是 C 类。

谢谢。

Weka的API有一个名为Classifier.distributionForInstance（）的方法，可以用来获取分类预测分布。然后，您可以通过降低概率对分布进行排序，以获得前 N 个预测。

下面是一个打印出来的函数：（1）测试实例的地面实况标签;（2）来自 classifyInstance（）的预测标签;（3）来自 distributionForInstance（）的预测分布。我已经在 J48 中使用了它，但它应该与其他分类器一起使用。

输入参数是序列化模型文件（可以在模型训练阶段创建并应用 -d 选项）和 ARFF 格式的测试文件。

public void test(String modelFileSerialized, String testFileARFF) 
    throws Exception
{
    // Deserialize the classifier.
    Classifier classifier = 
        (Classifier) weka.core.SerializationHelper.read(
            modelFileSerialized);
    // Load the test instances.
    Instances testInstances = DataSource.read(testFileARFF);
    // Mark the last attribute in each instance as the true class.
    testInstances.setClassIndex(testInstances.numAttributes()-1);
    int numTestInstances = testInstances.numInstances();
    System.out.printf("There are %d test instancesn", numTestInstances);
    // Loop over each test instance.
    for (int i = 0; i < numTestInstances; i++)
    {
        // Get the true class label from the instance's own classIndex.
        String trueClassLabel = 
            testInstances.instance(i).toString(testInstances.classIndex());
        // Make the prediction here.
        double predictionIndex = 
            classifier.classifyInstance(testInstances.instance(i)); 
        // Get the predicted class label from the predictionIndex.
        String predictedClassLabel =
            testInstances.classAttribute().value((int) predictionIndex);
        // Get the prediction probability distribution.
        double[] predictionDistribution = 
            classifier.distributionForInstance(testInstances.instance(i)); 
        // Print out the true label, predicted label, and the distribution.
        System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=", 
                          i, trueClassLabel, predictedClassLabel); 
        // Loop over all the prediction labels in the distribution.
        for (int predictionDistributionIndex = 0; 
             predictionDistributionIndex < predictionDistribution.length; 
             predictionDistributionIndex++)
        {
            // Get this distribution index's class label.
            String predictionDistributionIndexAsClassLabel = 
                testInstances.classAttribute().value(
                    predictionDistributionIndex);
            // Get the probability.
            double predictionProbability = 
                predictionDistribution[predictionDistributionIndex];
            System.out.printf("[%10s : %6.3f]", 
                              predictionDistributionIndexAsClassLabel, 
                              predictionProbability );
        }
        o.printf("n");
    }
}

我不知道

你是否可以在本地做到这一点，但你可以得到每个类的概率，对它们进行排序并取前三个。

你想要的函数是distributionForInstance(Instance instance)它返回一个double[]给出每个类的概率。

不是一般的。并非所有分类器都提供所需的信息——在大多数情况下（例如，对于决策树），决策是明确的（尽管可能不正确），没有置信度值。您的任务需要能够处理不确定性的分类器（例如朴素贝叶斯分类器）。

从技术上讲，最简单的事情可能是训练模型，然后对单个实例进行分类，Weka 应该为此提供所需的输出。一般来说，你当然也可以为实例集做到这一点，但我不认为 Weka 提供了开箱即用的。您可能必须自定义代码或通过 API 使用它（例如在 R 中）。

当你计算实例的概率时，你究竟是如何做到的？

我已经在这里发布了新实例的 PART 规则和数据，但就手动计算而言，我不太确定如何做到这一点！谢谢

编辑：现在计算：

private float[] getProbDist（String split）{

//包含诸如（52

/2）之类的内容，表示 52 个实例正确分类，2 个实例分类不正确。

    if(prob_dis.length > 2)
        return null;
    if(prob_dis.length == 1){
        String temp = prob_dis[0];
        prob_dis = new String[2];
        prob_dis[0] = "1";
        prob_dis[1] = temp; 
    }
    float p1 = new Float(prob_dis[0]);
    float p2 = new  Float(prob_dis[1]);
    // assumes two tags
    float[] tag_prob = new float[2];
    tag_prob[1] = 1 - tag_prob[1];
    tag_prob[0] = (float)p2/p1;
// returns double[] as being the probabilities
return tag_prob;    
}

相关内容

最新更新

热门标签：