如何在没有互联网的情况下使用AllenNLP和coref spanbert large来解决引用



想要在没有互联网的情况下使用AllenNLP和coref-spanbert大模型来解决引用。我试着用这里描述的方式来做https://demo.allennlp.org/coreference-resolution

我的代码:

from allennlp.predictors.predictor import Predictor
import allennlp_models.tagging
predictor = Predictor.from_path(r"C:UsersaapDesktopcoref-spanbert-large-2021.03.10.tar.gz")
example = 'Paul Allen was born on January 21, 1953, in Seattle, Washington, to Kenneth Sam Allen and Edna Faye Allen.Allen attended Lakeside School, a private school in Seattle, where he befriended Bill Gates, two years younger, with whom he shared an enthusiasm for computers.'
pred = predictor.predict(document=example)
coref_res = predictor.coref_resolved(example)
print(pred)
print(coref_res)

当我可以访问互联网时,代码工作正常。但当我无法访问互联网时,我会出现以下错误:

Traceback (most recent call last):
File "C:/Users/aap/Desktop/CoreNLP/Coref_AllenNLP.py", line 14, in <module>
predictor = Predictor.from_path(r"C:UsersaapDesktopcoref-spanbert-large-2021.03.10.tar.gz")
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlppredictorspredictor.py", line 361, in from_path
load_archive(archive_path, cuda_device=cuda_device, overrides=overrides),
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpmodelsarchival.py", line 206, in load_archive
config.duplicate(), serialization_dir
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpmodelsarchival.py", line 232, in _load_dataset_readers
dataset_reader_params, serialization_dir=serialization_dir
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpcommonfrom_params.py", line 604, in from_params
**extras,
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpcommonfrom_params.py", line 632, in from_params
kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpcommonfrom_params.py", line 200, in create_kwargs
cls.__name__, param_name, annotation, param.default, params, **extras
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpcommonfrom_params.py", line 307, in pop_and_construct_arg
return construct_arg(class_name, name, popped_params, annotation, default, **extras)
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpcommonfrom_params.py", line 391, in construct_arg
**extras,
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpcommonfrom_params.py", line 341, in construct_arg
return annotation.from_params(params=popped_params, **subextras)
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpcommonfrom_params.py", line 604, in from_params
**extras,
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpcommonfrom_params.py", line 634, in from_params
return constructor_to_call(**kwargs)  # type: ignore
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpdatatoken_indexerspretrained_transformer_mismatched_indexer.py", line 63, in __init__
**kwargs,
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpdatatoken_indexerspretrained_transformer_indexer.py", line 58, in __init__
model_name, tokenizer_kwargs=tokenizer_kwargs
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpdatatokenizerspretrained_transformer_tokenizer.py", line 71, in __init__
model_name, add_special_tokens=False, **tokenizer_kwargs
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagesallennlpcommoncached_transformers.py", line 110, in get_tokenizer
**kwargs,
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagestransformersmodelsautotokenization_auto.py", line 362, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagestransformersmodelsautoconfiguration_auto.py", line 368, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagestransformersconfiguration_utils.py", line 424, in get_config_dict
use_auth_token=use_auth_token,
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagestransformersfile_utils.py", line 1087, in cached_path
local_files_only=local_files_only,
File "C:UsersaapDesktopCoreNLPcorenlplibsite-packagestransformersfile_utils.py", line 1268, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
Process finished with exit code 1

请告诉我,在没有互联网的情况下,我需要什么才能让我的代码工作?

您将需要transformer模型的配置文件和词汇表的本地副本,这样令牌生成器和令牌索引器就不需要下载这些:

from transformers import AutoConfig, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(transformer_model_name)
config = AutoConfig.from_pretrained(transformer_model_name)
tokenizer.save_pretrained(local_config_path)
config.to_json_file(local_config_path + "/config.json")

然后,您需要将配置文件中的转换器模型名称覆盖到保存以下内容的本地目录(local_config_path(中:

predictor = Predictor.from_path(
r"C:UsersaapDesktopcoref-spanbert-large-2021.03.10.tar.gz",
overrides={
"dataset_reader.token_indexers.tokens.model_name": local_config_path,
"validation_dataset_reader.token_indexers.tokens.model_name": local_config_path,
"model.text_field_embedder.tokens.model_name": local_config_path,
},
)

在没有互联网的情况下使用结构化预测srl-bert时,我遇到了类似的问题,我在日志中看到了4个下载项目:

  1. dataset_reader.bert_model_name=bert base uncased,正在下载4个文件
  2. model INFO词汇表.py-正在从data/structured-prediction-srl-bert.2020.12.15/词汇表加载令牌字典。正在下载。。。4个较小的文件
  3. 找不到Spacy模型"en_core_web_sm">
  4. 稍后,[nltk_data]加载punkt:<urlopen错误[Erno-3]名称解析中的临时故障>[nltk_data]加载wordnet时出错:<urlopen错误[Erno-3]名称解析中的临时故障>

我用以下步骤解决了这个问题:

  1. 结构化预测srl-bert:
  • 我已经从https://demo.allennlp.org/semantic-role-labeling("模型卡"选项卡(-https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz
  • 我把它拉开了拉链/数据/结构预测-srl-bert.2020.12.15
  • 代码:

pip安装allennlp==2.10.0 allennlp型号===2.10.0

from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("./data/structured-prediction-srl-bert.2020.12.15/")
  1. 伯特基无上限
  • 我已经创建了一个文件夹/data/bert数据库未封顶,我已经从那里下载了这些文件https://huggingface.co/bert-base-uncased/tree/main
    • config.json
    • tokenizer.json
    • tokenizer_config.json
    • vocab.txt
    • pytorch_model.bin

我不得不把";bert_model_name"从";bert基无上限";进入路径"/数据/伯特基未封顶";,较早的导致下载。这必须在中完成/data/structured-prediction-srl-bert.2020.12.15/config.json,出现两次。

  1. python-m spacy下载en_core_web_sm
  2. python-c'导入nltk;下载("punkt"(;nltk.download("wordnet"('

经过这些步骤后,allennlp不再需要互联网。

最新更新