使用arff文件存储数据



我用这个例子为我的weka-projext创建.arff文件,在这里输入链接描述。

double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0}, 
{19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}};
int numInstances = data[0].length;
FastVector atts = new FastVector();
ArrayList<Instance> instances = new ArrayList<Instance>();
for (int dim = 0; dim < 2; dim++) {
// Create new attribute / dimension
Attribute current = new Attribute("Attribute" + dim, dim);
// Create an instance for each data object

if (dim == 0) {
for (int obj = 0; obj < numInstances; obj++) {
instances.add(new SparseInstance(0));
}
}
// Fill the value of dimension "dim" into each object
for (int obj = 0; obj < numInstances; obj++) {
instances.get(obj).setValue(current, data[dim][obj]);
System.out.println(instances.get(obj));
}
// Add attribute to total attributes
atts.addElement(current);
}
// Create new dataset
Instances newDataset = new Instances("Dataset", atts, instances.size());
// Fill in data objects
for (Instance inst : instances) {
newDataset.add(inst);       
}
BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
writer.write(newDataset.toString());
writer.flush();
writer.close();
}

我注意到结果格式将rows元素作为向量在.arff文件的列中。我想把整行放在.arff文件的第一行。我怎么能这么做?在我的例子中,2d矢量的最后一列表示行数据的标签。

我的arff文件的预期结果:

4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0, 1 // for example the first row
19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0,  
243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 
354.0, 356.0, 357.0, 358.0, 0 // the second row.

示例中的代码将表中的每一列视为一个实例(因此有29个实例,每个实例都有两个属性)。听起来你想把每一行都当作一个实例(给出两个实例,每个实例有29个属性):

double[][] data = {
{4058.0, 4059.0, ... }, /* first instance */
{19.0, 20.0, ... }      /* second instance */
};
int numAtts = data[0].length;
FastVector atts = new FastVector(numAtts);
for (int att = 0; att < numAtts; att++)
{
atts.addElement(new Attribute("Attribute" + att, att));
}
int numInstances = data.length;
Instances dataset = new Instances("Dataset", atts, numInstances);
for (int inst = 0; inst < numInstances; inst++)
{
dataset.add(new Instance(1.0, data[inst]));
}
BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
writer.write(dataset.toString());
writer.flush();
writer.close();

我用Instance替换了SparseInstance,因为几乎所有的属性值都是非零的。请注意,在Weka 3.7中,Instance已成为一个接口,应使用DenseInstance。此外,FastVector已经被弃用,取而代之的是Java的ArrayList

最新更新