我试图实现数字识别器,下面的博客文章详细讨论了这个问题:http://www.markhneedham.com/blog/2012/10/27/kaggle-digit-recognizer-mahout-random-forest-attempt/
当我执行Java程序时,我收到以下错误:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/mahout/classifier/df/builder/TreeBuilder
Caused by: java.lang.ClassNotFoundException: org.apache.mahout.classifier.df.builder.TreeBuilder
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: com.mawasthi.mahout.test.MahoutDigitRecognizer. Program will exit.
我有以下代码用于使用ApacheMahout实现数字识别器。
package com.mawasthi.mahout.test;
import java.lang.Math;
import java.util.ArrayList;
import java.util.Random;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.apache.mahout.classifier.df.data.Data;
import org.apache.mahout.classifier.df.data.Instance;
import org.apache.mahout.classifier.df.data.DataLoader;
import org.apache.mahout.classifier.df.DecisionForest;
import org.apache.mahout.classifier.df.builder.DefaultTreeBuilder;
import org.apache.mahout.classifier.df.ref.SequentialBuilder;
import org.apache.mahout.common.RandomUtils;
import org.apache.commons.math3.util.FastMath;
public class MahoutDigitRecognizer {
public static void main(String[] args) throws Exception {
// Build RF
String descriptor = "L N N N N N N N N N N N N N N N N N N N";
String[] trainDataValues = fileAsStringArray("data/train.csv");
Data data = DataLoader.loadData(DataLoader.generateDataset(descriptor, false, trainDataValues), trainDataValues);
int numberOfTrees = 100;
DecisionForest forest = buildForest(numberOfTrees,data);
// Test
String[] testDataValues = testFileAsStringArray("data/test.csv");
Data testData = DataLoader.loadData(data.getDataset(), testDataValues);
Random rng = RandomUtils.getRandom();
for (int i=0;i<testData.size();i++) {
Instance oneSample = testData.get(i);
double classify = forest.classify(testData.getDataset(), rng, oneSample);
int label = data.getDataset().valueOf(0, String.valueOf((int)classify));
System.out.println("Label: " + label);
}
}
private static DecisionForest buildForest(int numberOfTrees, Data data) {
int m = (int) Math.floor(FastMath.log(2.0, (double)data.getDataset().nbAttributes()) + 1);
DefaultTreeBuilder treeBuilder = new DefaultTreeBuilder();
treeBuilder.setM(m);
return new SequentialBuilder(RandomUtils.getRandom(), treeBuilder, data.clone()).build(numberOfTrees);
}
private static String[] fileAsStringArray(String file) throws Exception {
ArrayList<String> list = new ArrayList<String>();
DataInputStream in = new DataInputStream(new FileInputStream(file));
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
br.readLine(); // discard the header row
while((line = br.readLine()) != null) {
list.add(line);
}
in.close();
return list.toArray(new String[list.size()]);
}
private static String[] testFileAsStringArray(String file) throws Exception {
ArrayList<String> list = new ArrayList<String>();
DataInputStream in = new DataInputStream(new FileInputStream(file));
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String line;
br.readLine();
while((line = br.readLine()) != null) {
list.add("-," + line);
}
in.close();
return list.toArray(new String[list.size()]);
}
}
以下是我的POM.XML文件:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.mawasthi.mahout.test</groupId>
<artifactId>mvn-mahout-test</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.6.4</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>0.7</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
<version>3.0</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-utils</artifactId>
<version>0.5</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math</artifactId>
<version>0.4</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-collections</artifactId>
<version>1.0</version>
</dependency>
</dependencies>
<build>
<finalName>mvn-mahout-test</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<outputDirectory>${basedir}</outputDirectory>
</configuration>
</plugin>
</plugins>
</build>
</project>
作为Java的新手,我错过了使依赖的JARS可用的要点。我最终将它们捆绑到一个RunnableJAR中(我知道——这不是一个好主意),并能够运行它!