Here are some resources for learning Deep Learning as a mediocre programmer.
I found it best getting started by reading the Dive into Deep Learning book, which is comprehensible and has enough depth and breadth.
I met various math concepts in d2l constantly, some were new to me, and some were things I learnt at school but never touched then for more than 10 years. Below are some sites explaining math very well, even make you love math!
If you found matrices still too hard to understand, following lecture notes might help you as they helped me.
Many ideas in d2l are hard to understand, which is normal because it tries to explain and implement complex ideas in a short chapter. I struggled to build intuitions for some important ideas and below are some great articles that helped me a lot.
- Understanding softmax and the negative log-likelihood (archive)
- What is torch.nn really? (archive)
- Softmax Temperature (archive)
- Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention) (archive)
- Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch (archive)
- The Illustrated Transformer (archive)
- The Random Transformer (archive)
- Transformer Architecture: The Positional Encoding (archive)
- Sentence Embeddings. Introduction to Sentence Embeddings (archive)
- How to generate text: using different decoding methods for language generation with Transformers (archive)
- Understanding Rotary Positional Encoding (archive)
- Why Are Sines and Cosines Used For Positional Encoding? (archive)
- LLaMA-2 from the Ground Up (archive)
- Understanding Llama2: KV Cache, Grouped Query Attention, Rotary Embedding and More (archive)
- SwiGLU (archive)
Following LLaMA2 implementations in PyTorch are very similar, but their code comments are kind of complementary. I find reading both side-by-side really helpful.
After getting familiar with the PyTorch code, the C/C++ implementations help understand deeper.