This repository was archived by the owner on Jul 7, 2023. It is now read-only.
- Notifications
You must be signed in to change notification settings - Fork 3.7k
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint #1930
Copy link
Copy link
Open
Description
Description
Runtime error while training : t2t-trainer --generate_data --data_dir=/t2t_data --output_dir=/t2t_train/deque --problem=text2text_copyable_tokens --model=neural_deque_model --hparams_set=neural_deque --train_steps=100 --eval_steps=5
Environment information
OS: Ubuntu:18.04.5
$ pip freeze | grep tensor
mesh-tensorflow==0.1.21
tensor2tensor==1.15.7
tensorboard==1.15.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==1.15.0
tensorflow-addons==0.19.0
tensorflow-datasets==3.2.1
tensorflow-estimator==1.15.1
tensorflow-gan==2.1.0
tensorflow-hub==0.13.0
tensorflow-io-gcs-filesystem==0.32.0
tensorflow-metadata==1.12.0
tensorflow-probability==0.7.0
tensorstore==0.1.28
$ python -V
Python 3.7.12
For bugs: reproduction and error logs
# Steps to reproduce: ... # Error logs: Traceback (most recent call last): File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 35, in <module> tf.app.run(main) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/snoop/tracer.py", line 173, in simple_wrapper return function(*args, **kwargs) File "/opt/conda/envs/NeuralStack/bin/t2t-trainer", line 30, in main t2t_trainer.main(argv) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 418, in main execute_schedule(exp) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/bin/t2t_trainer.py", line 371, in execute_schedule getattr(exp, FLAGS.schedule)() File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensor2tensor/utils/trainer_lib.py", line 468, in continuous_train_and_eval self._eval_spec) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate return executor.run() File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run return self.run_local() File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local saving_listeners=saving_listeners) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default saving_listeners) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1495, in _train_with_estimator_spec any_step_done = True File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 861, in __exit__ self._close_internal(exception_type) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/monitored_session.py", line 894, in _close_internal h.end(self._coordinated_creator.tf_sess) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 600, in end self._save(session, last_step) File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_core/python/training/basic_session_run_hooks.py", line 619, in _save if l.after_save(session, step): File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 519, in after_save self._evaluate(global_step_value) # updates self.eval_result File "/opt/conda/envs/NeuralStack/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/training.py", line 544, in _evaluate 'Eval status: {}'.format(self.eval_result.status)) RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint Metadata
Metadata
Assignees
Labels
No labels