(我使用Python和scikit学习sklearn)我有一个数据集,它有(很多)这种格式的对象:
{"word":"something", "data":[12, 24, 54, 65, 76, 87, 45, 65, 32, 12, 65, 13, 54, 76, 45, 72, 12, 11, 54, 23, 65]}
我对每个单词都有几个这样的数据集,我制作了一个样本数据集,其中有100个单词,每个单词有3000个输入。我用一个生成一百个"种子"的脚本制作了它,从每个种子中,它通过使"data"
阵列的每个数字都是最大值为±15的随机变化来生成3000个输入(用于模拟现实生活中传感器的随机变化)。
从这个数据集中,我将约297000保存到一个名为"Words"的DB(Mongo DB)中,作为训练集。另一个3000到另一个DB(称为"测试")进行测试。
现在,我遇到的问题是,在我进行的3000次测试中,只有20次给出了预测,准确度得分为1.0。这些结果听起来不适合我,所以我认为我没有以正确的方式进行分类器
我尝试了DecisionTree和KNeighborsClassifier。我认为这两个分类器不适合我想要使用的数据类型。我应该使用哪个分类器?示例?
编辑
我正在粘贴一段数据库:(我有大约300000个这样的数据库,每个单词重复1000个)这些名称是"标签"one_answers"功能",因为YouTube上的一些视频告诉我,它们是这样叫的哈哈
{"label":"XpTrKrqjOC","features":[152,179,848,12,499,408,405,377,228,222]}
{"label":"XpTrKrqjOC","features":[157,170,843,17,502,411,402,373,236,219]}
{"label":"XpTrKrqjOC","features":[156,177,844,22,503,413,398,380,236,227]}
{"label":"XpTrKrqjOC","features":[157,172,847,22,504,416,401,379,238,222]}
{"label":"XpTrKrqjOC","features":[157,177,846,15,499,417,397,376,238,221]}
{"label":"XpTrKrqjOC","features":[155,176,846,14,508,410,400,370,229,225]}
{"label":"cOYHgaxByT","features":[230,1,190,985,173,483,178,216,601,309]}
{"label":"cOYHgaxByT","features":[235,6,188,985,170,486,183,216,605,312]}
{"label":"cOYHgaxByT","features":[235,2,188,985,171,478,175,216,600,314]}
{"label":"cOYHgaxByT","features":[234,-4,190,987,177,478,177,220,600,309]}
{"label":"cOYHgaxByT","features":[235,-1,191,983,172,478,180,219,598,306]}
{"label":"cOYHgaxByT","features":[234,-1,190,983,178,480,174,221,597,313]}
{"label":"cOYHgaxByT","features":[225,-4,195,990,170,479,181,221,602,307]}
{"label":"ZWmNqLVaIZ","features":[546,73,52,445,193,175,158,561,317,503]}
{"label":"ZWmNqLVaIZ","features":[551,69,52,440,198,172,154,566,312,504]}
{"label":"ZWmNqLVaIZ","features":[543,77,55,445,193,179,163,565,313,508]}
{"label":"ZWmNqLVaIZ","features":[550,72,56,443,193,180,161,563,319,502]}
{"label":"ZWmNqLVaIZ","features":[542,77,55,450,194,173,155,558,315,501]}
{"label":"ZWmNqLVaIZ","features":[543,72,57,450,191,176,156,560,318,508]}
{"label":"ZWmNqLVaIZ","features":[550,68,49,443,194,180,154,563,312,500]}
我认为,如果您首先使用LabelEncoder处理数据以获得稀疏矩阵,其中示例为行,单词为列,MultinomialNB应该工作得很好。