从s3加载FastText模型而不保存在本地

我希望在我制作并保存为s3上的.bin文件的ML管道中使用FastText模型。我希望将这一切都保存在基于云的管道中，所以我不需要本地文件。我觉得我真的很接近，但我不知道如何做一个临时的.bin文件。我也不确定我是否以最有效的方式保存和阅读FastText模型。下面的代码可以工作，但它将文件保存在本地，这是我想避免的。

import smart_open
file = smart_open.smart_open(s3 location of .bin model)
listed = b''.join([i for i in file])
with open("ml_model.bin", "wb") as binary_file:
binary_file.write(listed)
model = fasttext.load_model("ml_model.bin")

如果你想使用fasttext包装为官方Facebook FastText代码，你可能需要创建一个本地临时副本-你的麻烦使它看起来像代码依赖于打开一个本地文件路径。

您也可以尝试Gensim包的单独FastText支持，它应该通过其load_facebook_model()函数接受S3路径:

https://radimrehurek.com/gensim/models/fasttext.html gensim.models.fasttext.load_facebook_model

(注意，Gensim不支持所有的FastText功能，比如supervised模式)

正如上面的响应所部分回答的那样，需要一个临时文件。但最重要的是，临时文件需要作为字符串对象传递，这有点奇怪。工作代码如下:

import tempfile
import fasttext
import smart_open
from pathlib import Path
file = smart_open.smart_open(f's3://{bucket_name}/{key}')
listed = b''.join([i for i in file])
with tempfile.TemporaryDirectory() as tdir:
tfile = Path(tdir).joinpath('tempfile.bin')
tfile.write_bytes(listed)
model = fasttext.load_model(str(tfile))

相关内容

最新更新

热门标签：