sklearn随机森林分类python内存分配错误



我正在尝试对2.779900个具有5个属性和1个类的实例运行sklearn随机林分类。但是我在尝试在拟合线上运行分类时遇到了内存分配错误,它无法训练分类器本身。关于如何解决这个问题有什么建议吗?

数据a为

x、 y,天,周,准确度

x和y是坐标天是一个月的哪一天(1-30(星期是一周中的哪一天(1-7(精度为整数

代码:

import csv
import numpy as np
from sklearn.ensemble import RandomForestClassifier

with open("time_data.csv", "rb") as infile:
re1 = csv.reader(infile)
result=[]
##next(reader, None)
##for row in reader:
for row in re1:
result.append(row[8])
trainclass = result[:251900]
testclass = result[251901:279953]

with open("time_data.csv", "rb") as infile:
re = csv.reader(infile)
coords = [(float(d[1]), float(d[2]), float(d[3]), float(d[4]), float(d[5])) for d in re if len(d) > 0]
train = coords[:251900]
test = coords[251901:279953]
print "Done splitting data into test and train data"
clf = RandomForestClassifier(n_estimators=500,max_features="log2", min_samples_split=3, min_samples_leaf=2)
clf.fit(train,trainclass)
print "Done training"
score = clf.score(test,testclass)
print "Done Testing"
print score

错误:

line 366, in fit
builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)
File "sklearn/tree/_tree.pyx", line 145, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn/tree/_tree.pyx", line 244, in sklearn.tree._tree.DepthFirstTreeBuilder.build
File "sklearn/tree/_tree.pyx", line 735, in sklearn.tree._tree.Tree._add_node
File "sklearn/tree/_tree.pyx", line 707, in sklearn.tree._tree.Tree._resize_c
File "sklearn/tree/_utils.pyx", line 39, in sklearn.tree._utils.safe_realloc
MemoryError: could not allocate 10206838784 bytes

从scikit学习文档:"控制树大小的参数的默认值(例如max_depth、min_samples_leaf等(会导致完全生长和未编辑的树,这些树在某些数据集上可能非常大。为了减少内存消耗,应该通过设置这些参数值来控制树的复杂度和大小。">

然后我会尝试调整这些参数。此外,您可以尝试mem。探查器,或者如果你的机器RAM太少,尝试在GoogleCollaborator上运行它。

请尝试Google合作。您可以连接到本地主机或托管运行时。它对我有效,n_estimulations=10000。

我最近遇到了同样的MemoryErr。但我通过减少训练数据大小而不是修改模型参数来修复它。我的OOB值是0.98,这意味着模型不太可能过度拟合。

最新更新