Skip to content
This repository was archived by the owner on Jun 19, 2025. It is now read-only.

Reload weights after plateau#3245

Closed
DanBmh wants to merge 3 commits intomozilla:masterfrom
DanBmh:reload_rlrop
Closed

Reload weights after plateau#3245
DanBmh wants to merge 3 commits intomozilla:masterfrom
DanBmh:reload_rlrop

Conversation

@DanBmh
Copy link
Contributor

@DanBmh DanBmh commented Aug 12, 2020

Reload checkpoint weights after reaching a plateau that we use the best_dev weights again

@community-tc-integration
Copy link

No Taskcluster jobs started for this pull request
The `allowPullRequests` configuration for this repository (in `.taskcluster.yml` on the default branch) does not allow starting tasks for this pull request.
@lissyx
Copy link
Collaborator

lissyx commented Aug 13, 2020

@DanBmh Thanks, can you elaborate a little bit? My mind is kind of somewhere else, so I'm unsure I get the point here.

@DanBmh
Copy link
Contributor Author

DanBmh commented Aug 13, 2020

Currently training looks like this:

epoch 5: val_loss=62 e6: vl=59 e7: vl=60 e8: vl=61 Reached a plateau, LearningRate:=LR*0.1 e9: vl=60 <- Here we're using the weights from e8, with the suggested changes we're using e6 instead e10: vl=58 <- We have an improvement but the network has to do some more work to fix the errors from e7+e8 

The old approach did still work well but I think we can make it even better by reloading the weights from the best_dev checkpoint.

@DanBmh
Copy link
Contributor Author

DanBmh commented Aug 14, 2020

Not ready yet!
Found an error when using --drop_source_layers flag

@DanBmh DanBmh changed the title Reload weights after plateau WIP: Reload weights after plateau Aug 14, 2020
@DanBmh
Copy link
Contributor Author

DanBmh commented Aug 14, 2020

Working again:)

@DanBmh DanBmh changed the title WIP: Reload weights after plateau Reload weights after plateau Aug 14, 2020
@lissyx lissyx requested a review from reuben August 18, 2020 16:07
Copy link
Collaborator

@lissyx lissyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'd like Reuben's opinion.

Comment on lines +648 to +649
# Reload checkpoint that we use the best_dev weights again
load_or_init_graph_for_training(session, allow_drop_layers=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this function unchanged and add a new explicitly load_best_checkpoint function, we don't want this call to load last silently for example.

@reuben
Copy link
Contributor

reuben commented Aug 19, 2020

Re-opened as #3261 to run tests.

reuben added a commit that referenced this pull request Aug 20, 2020
@reuben
Copy link
Contributor

reuben commented Aug 20, 2020

Merged in #3261

@reuben reuben closed this Aug 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

3 participants