我正在尝试复制本页的代码。
在我的工作场所,我们可以访问变压器和pytorch库,但不能从我们的python环境连接到互联网。有没有人可以帮助我们如何在手动下载文件到我的机器后获得脚本工作?
我的具体问题是-
-
我应该去bert-base uncase at main下载所有的文件吗?我把它们放在一个有特定名称的文件夹里了吗?
如何修改下面的代码
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenize our sentence with the BERT tokenizer.
tokenized_text = tokenizer.tokenize(marked_text)
如何修改下面的代码
# Load pre-trained model (weights)
model = BertModel.from_pretrained('bert-base-uncased',
output_hidden_states = True, # Whether the model returns all hidden-states.
)
如果有人这样做了,请告诉我,谢谢
# # # update1
我去链接,手动下载所有文件到一个文件夹和指定路径的文件夹在我的代码。Tokenizer工作,但这一行model = BertModel.from_pretrained('bert-base-uncased', output_hidden_states = True, # Whether the model returns all hidden-states. )
失败。知道我该怎么做吗?我注意到下载的4个大文件有很奇怪的名字…我是否应该将它们重命名为上页所示的名称?我还需要下载其他文件吗?
错误信息是OSErrr: unable to load weights from pytorch checkpoint file for bert-base-uncased2/ at bert-base-uncased/pytorch_model.bin If you tried to load a pytroch model from a TF 2 checkpoint, please set from_tf=True
clone model repo用于下载所有文件
git lfs install
git clone https://huggingface.co/bert-base-uncased
# if you want to clone without large files – just their pointers
# prepend your git clone with the following env var:
GIT_LFS_SKIP_SMUDGE=1
git用法:
从这里下载git https://git-scm.com/downloads
将这些粘贴到您的cli(终端):
a。Git LFS install
Git克隆https://huggingface.co/bert-base-uncased等待下载,需要一些时间。如果你想监控你的web性能
找到当前目录,只需将cd粘贴到您的cli并获得文件路径(e。g"C:/用户/........./bert-base-uncased")
使用如下:
from transformers import BertModel, BertTokenizer model = BertModel.from_pretrained("C:/Users/........./bert-base-uncased") tokenizer = BertTokenizer.from_pretrained("C:/Users/........./bert-base-uncased")
手动下载,不含git:
从这里下载所有文件https://huggingface.co/bert-base-uncased/tree/main
将它们放入名为"yourfoldername"的文件夹
使用如下:
model = BertModel.from_pretrained("C:/Users/........./yourfoldername") tokenizer = BertTokenizer.from_pretrained("C:/Users/........./yourfoldername")
仅用于模型(手动下载,不含git):
只需点击这里的下载按钮并下载pytorch预训练模型。大约420mbhttps://huggingface.co/bert-base-uncased/blob/main/pytorch_model.bin
下载配置。Json文件从这里https://huggingface.co/bert-base-uncased/tree/main
把它们都放到一个名为"yourfilename"的文件夹
使用如下:
model = BertModel.from_pretrained("C:/Users/........./yourfilename")
回复"###update1"如果您试图从TF 2检查点加载pytorch模型,请设置from_tf=True'
请尝试从->https://huggingface.co/transformers/model_doc/bert.html
from transformers import BertTokenizer, BertForMaskedLM
import torch
tokenizer = BertTokenizer.from_pretrained("C:/Users/........./bert-base-uncased")
model = BertForMaskedLM.from_pretrained("C:/Users/........./bert-base-uncased")
inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
outputs = model(**inputs, labels=labels)
loss = outputs.loss
logits = outputs.logits
if thisworks我们理解文件系统或文件夹名没有任何问题。
如果它工作,试着得到隐藏状态(注意,bert模型已经返回隐藏状态解释:"裸伯特模型变压器输出原始隐藏状态,顶部没有任何特定的头部。所以你不需要使用"output_hidden_states = True,")
from transformers import BertTokenizer, BertModel
import torch
tokenizer = BertTokenizer.from_pretrained("C:/Users/........./bert-base-uncased")
model = BertModel.from_pretrained("C:/Users/........./bert-base-uncased")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state
if thisnot works尝试使用以下方法之一加载pytorch模型
# Load all tensors onto the CPU
torch.load("C:/Users/........./bert-base-uncased/pytorch_model.bin", map_location=torch.device('cpu'))
# Load all tensors onto GPU 1
torch.load("C:/Users/........./bert-base-uncased/pytorch_model.bin", map_location=lambda storage, loc: storage.cuda(1))
如果pytorch加载方法不工作,我们了解pytorch 1.4.0和已发布的Bert pytorch模型之间存在pytorch版本兼容性问题。或者你的pytorch_model.bin文件下载得不是很好。请注意pytorch 1.4.0发布的最后一个python是python3.4