Transformer
Attention is all you need!
The 3 most important keys to understanding transformer are:
Position Encoding: allows parallel training

2. Encoder Decoder mechanism

3. (Multi-head) (Self) Attention Mechanism

Make sure to watch the following three videos "in order" so that you wouldn't get overwhelmed immediately.
Note: to enable the network "learn" while doing attention, input is projected to a lower dimension using different linear projections (heads). different attentions will be learned from those H heads and added together to yield a final result.
Last updated