Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint #1930

@919294AkshatSharma

Description

@919294AkshatSharma

Description

Runtime error while training : t2t-trainer --generate_data --data_dir=/t2t_data --output_dir=/t2t_train/deque --problem=text2text_copyable_tokens --model=neural_deque_model --hparams_set=neural_deque --train_steps=100 --eval_steps=5

Environment information

OS: Ubuntu:18.04.5

$ pip freeze | grep tensor

mesh-tensorflow==0.1.21
tensor2tensor==1.15.7
tensorboard==1.15.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==1.15.0
tensorflow-addons==0.19.0
tensorflow-datasets==3.2.1
tensorflow-estimator==1.15.1
tensorflow-gan==2.1.0
tensorflow-hub==0.13.0
tensorflow-io-gcs-filesystem==0.32.0
tensorflow-metadata==1.12.0
tensorflow-probability==0.7.0
tensorstore==0.1.28

$ python -V
Python 3.7.12

For bugs: reproduction and error logs

# Steps to reproduce: ... 
# Error logs: Traceback (most recent call last): File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 35, in <module> tf.app.run(main) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/snoop/tracer.py", line 173, in simple_wrapper return function(*args, **kwargs) File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 30, in main t2t_trainer.main(argv) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 418, in main execute_schedule(exp) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 371, in execute_schedule getattr(exp, FLAGS.schedule)() File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 468, in continuous_train_and_eval self._eval_spec) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run return self.run_local() File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local saving_listeners=saving_listeners) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default saving_listeners) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1495, in _train_with_estimator_spec any_step_done = True File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 861, in __exit__ self._close_internal(exception_type) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 894, in _close_internal h.end(self._coordinated_creator.tf_sess) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 600, in end self._save(session, last_step) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 619, in _save if l.after_save(session, step): File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 519, in after_save self._evaluate(global_step_value) # updates self.eval_result File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 544, in _evaluate 'Eval status: {}'.format(self.eval_result.status)) RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions