DuplicateFlagError 时尝试在 Google collab 上训练 TensorFlow 对象检测 ap



我正在尝试在包含苹果和辣椒的数据集上训练Tensorflow对象检测API。为此,我生成了所需的文件(TF记录和带有注释的图像(,并将它们放在模型/研究/object_detection目录中。然后,我从 github 分叉了对象检测 api,并将我的文件推送到我的分叉存储库。然后,我在谷歌合作实验室中克隆这个存储库并运行 train.py 文件,但我收到 DuplicateFlagError:master 错误。

---------------------------------------------------------------------------
DuplicateFlagError               Traceback (most recent call last)
/content/models/research/object_detection/train.py in <module>()
     56 
     57 flags = tf.app.flags
---> 58 flags.DEFINE_string('master', '', 'Name of the TensorFlow master to use.')
     59 flags.DEFINE_integer('task', 0, 'task id')
     60 flags.DEFINE_integer('num_clones', 1, 'Number of clones to deploy per worker.')
/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/flags.py in wrapper(*args, **kwargs)
     56           'Use of the keyword argument names (flag_name, default_value, '
     57           'docstring) is deprecated, please use (name, default, help) instead.')
---> 58     return original_function(*args, **kwargs)
     59 
     60   return tf_decorator.make_decorator(original_function, wrapper)
/usr/local/lib/python3.6/dist-packages/absl/flags/_defines.py in DEFINE_string(name, default, help, flag_values, **args)
    239   parser = _argument_parser.ArgumentParser()
    240   serializer = _argument_parser.ArgumentSerializer()
--> 241   DEFINE(parser, name, default, help, flag_values, serializer, **args)
    242 
    243 
/usr/local/lib/python3.6/dist-packages/absl/flags/_defines.py in DEFINE(parser, name, default, help, flag_values, serializer, module_name, **args)
     80   """
     81   DEFINE_flag(_flag.Flag(parser, serializer, name, default, help, **args),
---> 82               flag_values, module_name)
     83 
     84 
/usr/local/lib/python3.6/dist-packages/absl/flags/_defines.py in DEFINE_flag(flag, flag_values, module_name)
    102   # Copying the reference to flag_values prevents pychecker warnings.
    103   fv = flag_values
--> 104   fv[flag.name] = flag
    105   # Tell flag_values who's defining the flag.
    106   if module_name:
/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py in __setitem__(self, name, flag)
    425         # module is simply being imported a subsequent time.
    426         return
--> 427       raise _exceptions.DuplicateFlagError.from_flag(name, self)
    428     short_name = flag.short_name
    429     # If a new flag overrides an old one, we need to cleanup the old flag's
DuplicateFlagError: The flag 'master' is defined twice. First from object_detection/train.py, Second from object_detection/train.py.  Description from first occurrence: Name of the TensorFlow master to use.

为了解决这个问题,我试图评论该行,但后来我在下一个标志即下一行上得到了 DuplicateFlagError。因此,为了尝试解决这个问题,我注释了 train.py 中声明这些标志的所有行,即我从第 58 行评论到第 82 行。但是,后来,我得到了错误 NotFoundError: ;

---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
/content/models/research/object_detection/train.py in <module>()
    165 
    166 if __name__ == '__main__':
--> 167   tf.app.run()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py in run(main, argv)
    124   # Call the main function, passing through any arguments
    125   # to the final program.
--> 126   _sys.exit(main(argv))
    127 
    128 
/content/models/research/object_detection/train.py in main(_)
    105                            ('input.config', FLAGS.input_config_path)]:
    106         tf.gfile.Copy(config, os.path.join(FLAGS.train_dir, name),
--> 107                       overwrite=True)
    108 
    109   model_config = configs['model']
/usr/local/lib/python3.6/dist-packages/tensorflow/python/lib/io/file_io.py in copy(oldpath, newpath, overwrite)
    390   with errors.raise_exception_on_not_ok_status() as status:
    391     pywrap_tensorflow.CopyFile(
--> 392         compat.as_bytes(oldpath), compat.as_bytes(newpath), overwrite, status)
    393 
    394 
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    514             None, None,
    515             compat.as_text(c_api.TF_Message(self.status.status)),
--> 516             c_api.TF_GetCode(self.status.status))
    517     # Delete the underlying status object from memory otherwise it stays alive
    518     # as there is a reference to status from this from the traceback due to
NotFoundError: ; No such file or directory

我应该如何解决?这是我的协作笔记本 - https://drive.google.com/file/d/1mZGOKX3JZXyG4XYkI6WHIXoNbRSpkE_F/view?usp=sharing

####Delete all flags before declare#####
def del_all_flags(FLAGS):
    flags_dict = FLAGS._flags()    
    keys_list = [keys for keys in flags_dict]    
    for keys in keys_list:
        FLAGS.__delattr__(keys)
del_all_flags(tf.flags.FLAGS)

在浏览了您的 colab 笔记本和从 tensorflow/models Github 存储库修改后的分支之后,以下是我如何让它在本地机器上工作。

我得到了最新的tensorflow版本,即1.6,与Google Colab上的版本相同。

  1. 您在ssd_mobilenet_v1_coco.config中指定的路径是 data/object-detection.pbtxt 。因此models/research/object_detection目录中执行 train.py。

  2. train.py希望--pipeline_config_path作为参数,但您已指定--pipeline_config 。因此,如果您浏览train.py代码,您将意识到如果未指定--pipeline_config_path,则默认配置文件名称为models.config,因此您会得到NotFoundError: ; No such file or directory

所以最终命令应该是这样的:

ubuntu@Himanshu:~/Desktop/models/research/object_detection$ python train.py --logtostderr --train_dir=training --pipeline_config_path=training/ssd_mobilenet_v1_coco.config
  1. 很好,我安装了 Tensorflow 1.6,我遇到了与这里提到的相同的错误:init(( 有一个意外的关键字参数"dct_method">

正如上面链接中的评论所建议的那样:删除第 109 行周围的object_detection/data_decoders/tf_example_decoder.py dct_method=dct_method

希望这有帮助。

最新更新