Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting interrupted training / checkpoints #7

Open
deccolquitt opened this issue May 17, 2022 · 4 comments
Open

Restarting interrupted training / checkpoints #7

deccolquitt opened this issue May 17, 2022 · 4 comments

Comments

@deccolquitt
Copy link

Is there anyway to restart interrupted training? I can't see a checkpoint-related command relating to train_main.py

@galgreshler
Copy link
Owner

Hi, I haven't implemented such a feature, but it should be quite easy: since training is done for each scale independently, and for each finished scale the networks are saved, you could start training from the last finished scale. If you decide to implement this I would be happy to add this feature to the repository.
Gal

@deccolquitt
Copy link
Author

Unfortunately I don't know enough about coding to do this, whenever I have tried using the same dataset it just creates a new directory and starts from scratch. Thanks anyway.

@deccolquitt
Copy link
Author

would this be the right sort of thing to look at?: [https://stackoverflow.com/questions/42703500/best-way-to-save-a-trained-model-in-pytorch]

@galgreshler
Copy link
Owner

Yes, but this is already done during training. If you want to implement continuation of existing model, you have to load its saved networks and continue training of the following scales.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants