Most of this while I’m reading the “Attention is all you need” paper. The most important resources will be The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time and 9.3. Transformer — Dive into Deep Learning 0.7 documentation.
- BLEU is a metric of how good machine translation is.
- Gentle Introduction to Transduction in Machine Learning
Induction, deriving the function from the given data. Deduction, deriving the values of the given function for points of interest. Transduction, deriving the values of the unknown function for points of interest from the given data.
- Positional encoding in the Transformer is very well described at 9.3. Transformer — Dive into Deep Learning 0.7 documentation, with a visualization. Needed because there is no notion of the order of words in the architecture 1 We can’t do n=1..10 because sentences have different lengths, and word 3 out of 10 is not the same as 3 out of 3.
- “The intuition here is that adding these values to the embeddings provides meaningful distances between the embedding vectors once they’re projected into Q/K/V vectors and during dot-product attention” 2
- Subword algorithms are ways to represent words that use elements bigger than characters but lower than a word embedding, for example prefixes and suffixes, to better handle unseen words. Byte-pair and word-piece encodings are used by the Transformer.[^swa]
- In essence, label smoothing will help your model to train around mislabeled data and consequently improve its robustness and performance. 3
- Attention? Attention! is a really nice intro to Attention in general. And the blog itself (Lil’Log) is also absolutely fascinating.
- neural networks - What exactly are keys, queries, and values in attention mechanisms? - Cross Validated
- The Transformer – Attention is all you need. - Michał Chromiak’s blog + the other 3 posts in the blog.
- Dive into Deep Learning — Dive into Deep Learning 0.7 documentation this look really cool, has all the basics and is executable. Another candidate for main textbook.
- The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time is absolutely wonderful in all aspects.
- I should make a better Bash timer that counts till the end of the hour, so I don’t have to do this in my head
- I should make a vim keybinding or script that automagically creates Markdown references. (I’d be surprised if this hasn’t been done)