Using Reinforcement Learning (Deep Q-Learning) to train the agent in several games.
Early stages of training:
Final stage of training (52min to reach the maximum score 3 times in colab - 186 episodes):
The Pytorch model got it in less than 1 minute (218 episodes - not in colab)