Weka:向数据集添加新实例



我有一个weka数据集:

@attribute uid numeric
@attribute itemid numeric
@attribute rating numeric
@attribute timestamp numeric
@data
196 242 3   881250949
186 302 3   891717742
22  377 1   878887116
196 51  5   881250949
244 51  2   880606923

如果我想添加一个新的实例,像这样:

244 59  2   880606923

我该怎么做?

是这样的吗?

Instances newData = arffLoader.getDataSet();
    for (int i = 0; i < newData.numInstances(); i++) {
         Instance one = newData.instance(i);
         one.setDataset(data);
         data.add(one);
    }

试试下面的代码。您需要做的是为您的新值创建一个双精度数组。使用DenseInstance类将它们添加到实例对象中。

public static void main(String[] args) {

    String dataSetFileName = "stackoverflowQuestion.arff";
    Instances data = MyUtilsForWekaInstanceHelper.getInstanceFromFile(dataSetFileName);
    System.out.println("Before adding");
    System.out.println(data);

    double[] instanceValue1 = new double[data.numAttributes()];
    instanceValue1[0] = 244;
    instanceValue1[1] = 59;
    instanceValue1[2] = 2;
    instanceValue1[3] = 880606923;
    DenseInstance denseInstance1 = new DenseInstance(1.0, instanceValue1);
    data.add(denseInstance1);
    System.out.println("-----------------------------------------------------------");
    System.out.println("After adding");
    System.out.println(data);

public class MyUtilsForWekaInstanceHelper {
public static Instances getInstanceFromFile(String pFileName)
{
    Instances data = null;
    try {
        BufferedReader reader = new BufferedReader(new FileReader(pFileName));
        data = new Instances(reader);
        reader.close();
        // setting class attribute
        data.setClassIndex(data.numAttributes() - 1);
    }
    catch (Exception e) {
        throw new RuntimeException(e);
    } 
    return data;
}
  }

输出如下:

Before adding
@relation stackoverflowQuestion
@attribute uid numeric
@attribute itemid numeric
@attribute rating numeric
@attribute timestamp numeric
@data
196,242,3,881250949
186,302,3,891717742
22,377,1,878887116
196,51,5,881250949
244,51,2,880606923
---------------------------------------------------------------------------------
After adding
@relation stackoverflowQuestion
@attribute uid numeric
@attribute itemid numeric
@attribute rating numeric
@attribute timestamp numeric
@data
196,242,3,881250949
186,302,3,891717742
22,377,1,878887116
196,51,5,881250949
244,51,2,880606923
244,59,2,880606923

您可以简单地将新行附加到arff文件中,如:

String filename= "MyDataset.arff";
FileWriter fwriter = new FileWriter(filename,true); //true will append the new instance
fwiter.write("244 59  2   880606923n");//appends the string to the file
fwriter.close();

新的实例可以很容易地添加到任何现有的数据集如下:

 //assuming we already have arff loaded in a variable called dataset
     Instance newInstance  = new Instance();
     for(int i = 0 ; i < dataset.numAttributes() ; i++)
     {
         newInstance.setValue(i , value);
         //i is the index of attribute
         //value is the value that you want to set
     }
     //add the new instance to the main dataset at the last position
     dataset.add(newInstance);
     //repeat as necessary

最新更新