算法模型持久性与Sklearn

我对算法很新鲜，但我已经使用了Scikit-learn，我知道如何在使用Joblib训练它之后如何持续我的机器学习模型：

from sklearn.externals joblib
model = RandomForestRegressor()
# Train the model, etc
joblib.dump(model, "prediction/model/model.pkl")

现在，我想托管我的ML模型，并将其称为使用算法的服务，但我不知道如何回复模型。我在算法中创建了一个名为" testCollection"的集合，其中是一个名为" model.pkl"的文件，该文件是joblib.dump调用的结果。根据文档，这意味着我的文件应位于

data：//（用户名）/testCollection/model.pkl

我想使用Joblib.load从文件中读取该模型。这是我目前在算法中的算法：

import Algorithmia
def apply(input):
    client = Algorithmia.client()
    f = client.file("data://(username)/testcollection/model.pkl")
    print(f.path)
    print(f.url)
    print(f.getName())
    model = joblib.load(f.url) # Or f.path, both don't work
    return "empty"

这是输出：

(username)/testcollection/model.pkl
/v1/data/(username)/testcollection/model.pkl
model.pkl

在Joblib.load行中出现错误

这是我在调用Joblib.load时尝试的所有路径/URL：

/v1/data/（用户名）/testcollection/model.pkl
data：//（用户名）/testCollection/model.pkl
（用户名）/testCollection/model.pkl
https://algorithmia.com/v1/data/（username）/testCollection/model.pkl

如何使用Joblib从文件中加载模型？我是错误的方式吗？

有几种方法可以访问DataAPI上的数据。

以下是通过Python客户端访问文件的4种不同的方法：

import Algorithmia
client = Algorithmia.client("<YOUR_API_KEY>")
dataFile = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getFile()
dataText = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getString()
dataJSON = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getJson()
dataBytes = client.file("data://<USER_NAME>/<COLLECTION_NAME>/<FILE_NAME>").getBytes()

由于Sklearn期望通往模型文件的路径，因此最简单的方法是通过文件对象（又称datafile）。

根据官方Python2.7文档，如果创建文件对象以外的open()函数，则对象属性name通常对应于文件的路径。

在这种情况下，您需要写这样的东西：

import Algorithmia
def apply(input):
    # You don't need to write your API key if you're editing in the web editor
    client = Algorithmia.client()
    modelFile = client.file("data://(username)/testcollection/model.pkl").getFile()
    modelFilePath = modelFile.name
    model = joblib.load(modelFilePath)
    return "empty"

但是根据官方的Sklearn模型持久文档，您还应该能够通过类似文件的对象而不是文件名。

因此，我们可以跳过试图获取文件名的部分，然后通过modelFile对象：

import Algorithmia
def apply(input):
    # You don't need to write your API key if you're editing in the web editor
    client = Algorithmia.client()
    modelFile = client.file("data://(username)/testcollection/model.pkl").getFile()
    model = joblib.load(modelFile)
    return "empty"

编辑：这也是官方算法开发人员中心的文章，谈论Scikit-Learn中的模型持久性。

完整的披露者：我在算法上担任算法工程师。

相关内容

最新更新

热门标签：