Restarting interrupted training / checkpoints #7

deccolquitt · 2022-05-17T20:12:31Z

Is there anyway to restart interrupted training? I can't see a checkpoint-related command relating to train_main.py

galgreshler · 2022-05-18T12:31:30Z

Hi, I haven't implemented such a feature, but it should be quite easy: since training is done for each scale independently, and for each finished scale the networks are saved, you could start training from the last finished scale. If you decide to implement this I would be happy to add this feature to the repository.
Gal

deccolquitt · 2022-05-21T16:42:23Z

Unfortunately I don't know enough about coding to do this, whenever I have tried using the same dataset it just creates a new directory and starts from scratch. Thanks anyway.

deccolquitt · 2022-05-30T22:09:49Z

would this be the right sort of thing to look at?: [https://stackoverflow.com/questions/42703500/best-way-to-save-a-trained-model-in-pytorch]

galgreshler · 2022-06-02T06:33:59Z

Yes, but this is already done during training. If you want to implement continuation of existing model, you have to load its saved networks and continue training of the following scales.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting interrupted training / checkpoints #7

Restarting interrupted training / checkpoints #7

deccolquitt commented May 17, 2022

galgreshler commented May 18, 2022

deccolquitt commented May 21, 2022

deccolquitt commented May 30, 2022

galgreshler commented Jun 2, 2022

Restarting interrupted training / checkpoints #7

Restarting interrupted training / checkpoints #7

Comments

deccolquitt commented May 17, 2022

galgreshler commented May 18, 2022

deccolquitt commented May 21, 2022

deccolquitt commented May 30, 2022

galgreshler commented Jun 2, 2022