我将需要Scikit-Learn入门时需要帮助。一个非常简单的解决方案作为起点将有很大帮助。指出我的例子有类似的问题也会有所帮助。
我有一个带有以下内容的文本文件(history.txt)
Id=101;Username=john;Date=1475359200;Announcement=111;Result=50;Title=blub;MassRequest=111;VolumeRequest=10
Id=104;Username=john;Date=1475359900;Announcement=40;Result=23;Title=blah;MassRequest=300;VolumeRequest=50
Id=222;Username=dave;Date=1475399200;Announcement=600;Result=420;Title=foo;MassRequest=40;VolumeRequest=20
Id=301;Username=john;Date=1475559200;Announcement=300;Result=150;Title=bar;MassRequest=10;VolumeRequest=33
Id=407;Username=dave;Date=1475659200;Announcement=200;Result=180;Title=blah-foo;MassRequest=90;VolumeRequest=55
将此文件读为熊猫数据框后,我想训练Scikit。有了新的输入" new_announce",我想收到一个可能的"结果"的值。
import pandas as pd
history = []
f = open("history.txt",'r')
for line in f.read().strip().split('n'):
dummy = {}
for data in line.split(';'):
if data:
(key,value) = data.split('=')
dummy[key] = value
history.append(dummy)
#df = pd.DataFrame.from_records(history)
df = pd.DataFrame(history)
#Train here scikit-learn
new_announce ={'Id': '507',
'Username': 'dave',
'MassRequest': '10',
'Announcement': '333',
'Title': 'foobar',
'MassRequest': '10',
'VolumeRequest': '55'}
预先感谢
我已经使用pandas和sclearn multinomialnb实现了一个非常基本的示例。
这里的一个例子。
import pandas as pd
from util import Util
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# Load CSVs into panda dataframes
u=Util()
reviews_df = u.getCommentsDf()
# Divide data into train and test dataset
split= int(round(0.7 * len(reviews_df)))
train =reviews_df[:split]
test=reviews_df[split:]
print ("Training data ")
print(train.groupby('suspended').size())
print ("Testing data " )
print(test.groupby('suspended').size())
vectorizer = CountVectorizer(stop_words='english')
# Learn the vocabulary dictionary and return term-document matrix
X = vectorizer.fit_transform(train['body'].values.astype('U'))
y = train['suspended']
clf =MultinomialNB()
# Fit Naive Bayes classifier according to X, y
clf.fit(X,y)
xTest=vectorizer.transform(test['body'].values.astype('U'))
# Perform classification on an array of test vectors X
pred = clf.predict(xTest)
# generate report
trueValue= test['suspended']
print 'Accuracy Score t '+ str(accuracy_score(trueValue, pred, normalize=False))
您可以使用其他工具和库来提取功能并进行播放。
除Sklearn外,还有一些很好的参考。
如果您正在寻找特定的内容,请发布问题。