为什么MALLET文本分类为所有测试文件输出相同的值1.0



我正在学习槌槌文本分类命令行。估计不同类别的输出值都是相同的1.0。我不知道我在哪里不正确。你能帮忙吗?

mallet版本:e: mallet mallet-2.0.8rc3

//there is a txt file about cat breed (catmaterial.txt) in cat dir.
//command 1
C:Userstoshiba>mallet import-dir --input E:Mallettestmaterialcat --output E
:Mallettestmaterialcat.mallet --remove-stopwords
//command 1 output
Labels =
   E:Mallettestmaterialcat
//command 2, save classifier as catClass.classifier
C:Userstoshiba>mallet train-classifier --input E:Mallettestmaterialcat.mall
et --trainer NaiveBayes --output-classifier E:MallettestmaterialcatClass.clas
sifier
//command 2 output
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0
-------------------- Trial 0  --------------------
Trial 0 Training NaiveBayesTrainer with 1 instances
Trial 0 Training NaiveBayesTrainer finished
No examples with predicted label !
No examples with true label !
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer training data accuracy = 1.0
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
No examples with predicted label !
Trial 0 Trainer NaiveBayesTrainer test data precision() = 1.0
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data recall() = 1.0
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data F1() = 1.0
Trial 0 Trainer NaiveBayesTrainer test data accuracy = NaN
NaiveBayesTrainer
Summary. train accuracy mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
Summary. test precision() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test recall() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test f1() mean = 1.0 stddev = 0.0 stderr = 0.0
//command 3, estimate classes of the three files about cat, deer and dog. The cat file is the same as the one for cat.mallet
C:Userstoshiba>mallet classify-dir --input E:Mallettestmaterialtest_cat_dir
 --output - --classifier E:MallettestmaterialcatClass.classifier

//command 3 output
file:/E:/Mallet/testmaterial/test_cat_dir/catmaterial.txt               1.0
file:/E:/Mallet/testmaterial/test_cat_dir/deertext.txt          1.0
file:/E:/Mallet/testmaterial/test_cat_dir/dogmaterial.txt               1.0
// why the three classes are all 1.0 ?
C:Userstoshiba>

您可以帮忙吗?谢谢。

更新:

谢谢您的答案,但仍将所有文件输出1.0。

我的想法是,我将一些狗文件放在Dog Dir中,并将这些狗文件作为实例,训练有素的模型,然后在test_dir中测试了一些文件以查看结果。

我根据对您的建议的理解尝试,但仍输出所有相同的1.0。

您会在下面帮助我使用我的命令吗?

在e: mallet train_dir 狗中,有4个狗txt文件(狗2.txt,dog4.txt,dog5.txt,dogmaterial.txt(。

在e: mallet test_dir中,有9个txt文件(cat2.txt,catmaterial.txt,deermaterial.txt,dog3.txt,dog6.txt,dog 2.txt,dog4.txt,dog4.txt,dog5.txt,dogmaterial.txt(。


C:Userstoshiba>mallet import-dir --input E:Mallettrain_dirdog --output E:M
alletclassifier_dir3animal.mallet --remove-stopwords
Labels =
   E:Mallettrain_dirdog

C:Userstoshiba>mallet train-classifier --input E:Malletclassifier_dir3anima
l.mallet --trainer NaiveBayes --output-classifier E:Malletclassifier_dir3anim
alClass.classifier
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0                          
-------------------- Trial 0  --------------------
Trial 0 Training NaiveBayesTrainer with 4 instances
Trial 0 Training NaiveBayesTrainer finished
No examples with predicted label !
No examples with true label !
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer training data accuracy = 1.0
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
No examples with predicted label !
Trial 0 Trainer NaiveBayesTrainer test data precision() = 1.0
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data recall() = 1.0
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data F1() = 1.0
Trial 0 Trainer NaiveBayesTrainer test data accuracy = NaN
NaiveBayesTrainer
Summary. train accuracy mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
Summary. test precision() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test recall() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test f1() mean = 1.0 stddev = 0.0 stderr = 0.0

C:Userstoshiba>mallet classify-dir --input E:Mallettest_dir --output - --cla
ssifier E:Malletclassifier_dir3animalClass.classifier
file:/E:/Mallet/test_dir/cat2.txt               1.0
file:/E:/Mallet/test_dir/catmaterial.txt                1.0
file:/E:/Mallet/test_dir/deertext.txt           1.0
file:/E:/Mallet/test_dir/dog%202.txt            1.0
file:/E:/Mallet/test_dir/dog3.txt               1.0
file:/E:/Mallet/test_dir/dog4.txt               1.0
file:/E:/Mallet/test_dir/dog5.txt               1.0
file:/E:/Mallet/test_dir/dog6.txt               1.0
file:/E:/Mallet/test_dir/dogmaterial.txt                1.0
C:Userstoshiba>

谢谢。

有两个输入选项。input-dir将目录视为类,每个目录中的每个文件作为输入实例。input-file逐行读取输入文件,并将行中的各个字段视为标签和实例数据。您使用的是文件中的输入类型,因此您正在创建一个具有一个类和一个实例的分类器。我猜你想要档案类型。

相关内容

  • 没有找到相关文章

最新更新