Transformer

Attention is all you need!

The 3 most important keys to understanding transformer are:

  1. Position Encoding: allows parallel training

2. Encoder Decoder mechanism

3. (Multi-head) (Self) Attention Mechanism

Make sure to watch the following three videos "in order" so that you wouldn't get overwhelmed immediately.

  • Note: to enable the network "learn" while doing attention, input is projected to a lower dimension using HH different linear projections (heads).HH different attentions will be learned from those H heads and added together to yield a final result.

Last updated