Transformer
Attention is all you need!
Last updated
Attention is all you need!
Last updated
The 3 most important keys to understanding transformer are:
Position Encoding: allows parallel training
2. Encoder Decoder mechanism
3. (Multi-head) (Self) Attention Mechanism
Make sure to watch the following three videos "in order" so that you wouldn't get overwhelmed immediately.
Note: to enable the network "learn" while doing attention, input is projected to a lower dimension using different linear projections (heads). different attentions will be learned from those H heads and added together to yield a final result.