文件 "train.py" ，第 35 行，在 clip_gradient modulenorm = p.grad.data.norm()

我不确定为什么会出现此错误：

python train.py --batch-size 20 --rnn_type GRU --cuda --gpu 1 --lr 0.0001 --mdl RNN --clip_norm 1 --opt Adam
/scratch/sjn-p2/anaconda/anaconda2/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
There are 2 CUDA devices
Setting torch GPU to 1
Using device:1 
Stored Environment:['term_len', 'word_index', 'glove', 'max_len', 'train', 'dev', 'test', 'index_word']
Loaded environment
Creating Model...
Setting Pretrained Embeddings
Initialized GRU model
Starting training
Namespace(aggregation='mean', attention_width=5, batch_size=20, clip_norm=1, cuda=True, dataset='Restaurants', dev=1, dropout_prob=0.5, embedding_size=300, epochs=50, eval=1, gpu=1, hidden_layer_size=300, l2_reg=0.0, learn_rate=0.0001, log=1, maxlen=0, mode='term', model_type='RNN', opt='Adam', pretrained=1, rnn_direction='uni', rnn_layers=1, rnn_size=300, rnn_type='GRU', seed=1111, term_model='mean', toy=False, trainable=1)
========================================================================
/scratch2/debate_tweets/sentiment/pytorch_sentiment_rnn/models/rnn.py:51: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
decoded = self.softmax(decoded)
Traceback (most recent call last):
File "train.py", line 343, in <module>
exp.train()
File "train.py", line 326, in train
loss = self.train_batch(i)
File "train.py", line 303, in train_batch
coeff = clip_gradient(self.mdl, self.args.clip_norm)
File "train.py", line 35, in clip_gradient
modulenorm = p.grad.data.norm()
AttributeError: 'NoneType' object has no attribute 'data'
[jalal@goku pytorch_sentiment_rnn]$

对于 https://github.com/vanzytay/pytorch_sentiment_rnn 中的 train.py 文件我已经按照自述文件中的所有步骤操作到这里。您认为应该解决什么问题？

OS: CentOS Linux release 7.4.1708 (Core)
PyTorch version: 0.3.1.post2
How you installed PyTorch (conda, pip, source): conda install -c pytorch pytorch
Python version: Python 2.7.14 |Anaconda custom (64-bit)| (default, Dec 7 2017, 17:05:42)
CUDA/cuDNN version: CUDA Version 8.0.61
GPU models and configuration: GP102 [GeForce GTX 1080 Ti], driver=nvidia latency=0
GCC version (if compiling from source): [GCC 7.2.0] on linux2

我不确定我是否解决了这个问题(因为我没有得到与自述文件中提到的相同的准确性)，但这是我所做的：在此处添加if p.grad is not None:：

def clip_gradient(model, clip):
"""Computes a gradient clipping coefficient based on gradient norm."""
totalnorm = 0
for p in model.parameters():
if p.grad is not None:
modulenorm = p.grad.data.norm()
totalnorm += modulenorm ** 2
totalnorm = math.sqrt(totalnorm)
return min(1, clip / (totalnorm + 1e-6))

并在以下位置添加了if p.grad is not None:：

def train_batch(self, i):
''' Trains a regular RNN model
'''
sentence, targets, actual_batch = self.make_batch(self.train_set, i)
if(sentence is None):
return None
hidden = self.mdl.init_hidden(actual_batch)
hidden = repackage_hidden(hidden)
self.mdl.zero_grad()
output, hidden = self.mdl(sentence, hidden)
loss = self.criterion(output, targets)
loss.backward()
if(self.args.clip_norm>0):
coeff = clip_gradient(self.mdl, self.args.clip_norm)
for p in self.mdl.parameters():
if p.grad is not None:
p.grad.mul_(coeff)
self.optimizer.step()
return loss.data[0]

我得到的准确度如下：对于RNN模型：

python train.py --batch-size 20 --rnn_type GRU --cuda --gpu 1 --lr 0.0001 --mdl RNN --clip_norm 1 --opt Adam
[Epoch 50] Train Loss=0.953654762131 T=0.51s
Test loss=0.90144520998
Output Distribution={2: 1120}
Accuracy=0.65

对于TD-RNN模型：

python train.py --batch-size 20 --rnn_type GRU --cuda --gpu 1 --lr 0.0001 --mdl TD-RNN --clip_norm 1 --opt Adam
[Epoch 50] Train Loss=0.64427837713 T=0.99s
Test loss=0.828059911728
Output Distribution={0: 165, 1: 138, 2: 817}
Accuracy=0.719642857143

但是，github 自述文件中提到的RNN准确性是：

[Epoch 50] Train Loss=0.680990989366
Test loss=0.810974478722
Output Distribution={0: 158, 1: 158, 2: 804}
Accuracy=0.733035714286

造成这种差异的可能原因是准确性？您将如何在不降低准确性的情况下解决问题？不确定我的修复是否导致准确性下降。

我使用了此链接中的修复程序 https://discuss.pytorch.org/t/model-parameters-is-none-while-training/6830/2?u=monajalal

相关内容

最新更新

热门标签：