我正在用Django部署一个机器学习模型(命名为实体识别)。简而言之,用户选择一个字段(例如,政治或科学)并在搜索区域中写入文本。然后模型识别文本中的命名实体。
我的问题是每次触发视图时都会加载ML模型(编码器),这会减慢过程。你知道如何优化它,只加载一次吗?
My views.py:
def search_view(request):
if request.POST:
field = request.POST['field']
query = request.POST['query']
encoder = load_encoder(field)
results = Ner_model(query,encoder)
context['result'] = results
return render(request,'ner/results.html', context)
加载编码器功能:
def load_encoder(field):
path_encoder = os.paths.join(field,'field_encoder')
encoder = AutoTokenizer.from_pretrained(path_encoder)
return encoder
谢谢!
最好先加载所有可能的编码器和模型。然后对每种情况调用适当的模型。例如,如果您有两个字段:
encoders = {}
fields = [field1, field2]
for field in fields:
path_encoder = os.path.join(field,'field_encoder')
encoder = AutoTokenizer.from_pretrained(path_encoder)
encoders[field] = encoder
def load_encoder(field):
encoder = encoders.get(field)
return encoder
受@Masoud建议的启发,我定义了2个全局变量:一个用于存储加载的编码器,另一个用于跟踪所选字段。尽管全局变量通常不推荐使用…
就得到了这样的结果:
encoder = None
old_field = ''
def search_view(request):
global encoder
global old_field
if request.POST:
field = request.POST['field']
query = request.POST['query']
if (not encoder) or (old_field!=field):
encoder = load_encoder(field)
results = Ner_model(query,encoder)
context['result'] = results
return render(request,'ner/results.html', context)