java null指针中的mallet naivebayes分类器

我试图实例化幼稚的贝叶斯分类器以对文本块进行分类(使用预定义的分类(。下面的示例只是试图与男性/女性一起做。我已经尝试从文件(CSVLOADER(和下面创建实例中加载数据。问题是trainer.train((方法引发了空指针异常。这似乎是因为目标数值为无效。数据字典被填充。我如何强制实例填充的目标命令？

我的实际目标是将我在数据库中的论文摘要分类为"科学，政治，法律，健康等。看来贝叶斯分类器是对此的正确选择。

我已经在已加载的Instancelist上进行了迭代，并且似乎正确填充了，并且填充了datadictionary，但是TargetDictionary是无效的。

在Windows上使用Mallet 2.0.8

public TestMallet() throws IOException {
ArrayList<Pipe> pipelist = new ArrayList<Pipe>();
    pipelist.add (new CharSequenceLowercase() ) ;
    pipelist.add (new CharSequence2TokenSequence(Pattern.compile("\p{L}[\p{L}\p{P}]+\p{L}")) ) ;
    pipelist.add (new TokenSequenceRemoveStopwords (new File ("c:\test\config\stopwords_en.txt"), "UTF-8", false, false, false) ) ;
    pipelist.add (new TokenSequence2FeatureSequence()) ;
    pipelist.add (new FeatureSequence2FeatureVector()) ; // Added but doesnt make any difference
    InstanceList instances = new InstanceList (new SerialPipes(pipelist)) ;
    Instance instance0 = new Instance("Hello World I am here and i am male my name is roger",   "Male",   "roger", "test") ;
    Instance instance1 = new Instance("Hello World I am here and i am male my name is phil",    "Male",   "phil",  "test") ;
    Instance instance2 = new Instance("Hello World I am here and i am male my name is joe",     "Male",   "joe",   "test") ;
    Instance instance3 = new Instance("Hello World I am here and i am female my name is vira",  "Female", "vira",  "test") ;
    Instance instance4 = new Instance("Hello World I am here and i am female my name is josie", "Female", "josie", "test") ;
    instances.addThruPipe (instance0) ;
    instances.addThruPipe (instance1) ;
    instances.addThruPipe (instance2) ;
    instances.addThruPipe (instance3) ;
    instances.addThruPipe (instance4) ;
    // Using Instance List to train
    // ----------------------------
    ClassifierTrainer trainer = new NaiveBayesTrainer();
    trainer.train(instances); 
// Null pointer exception here ( debugging, it looks like TargetDictionary is null) 
}

期望培训师正确分析。

分类器学会根据输入功能预测输出。在这两种情况下，我们通常都需要将字符串转换为数字表示。您是在告诉木匠如何为输入功能进行此转换，而不是输出标签。

添加Target2Label()管道应该这样做，以示例参见Csv2Vectors类。

相关内容

最新更新

热门标签：