Transformer

Attention is all you need!

The 3 most important keys to understanding transformer are:

2. Encoder Decoder mechanism

3. (Multi-head) (Self) Attention Mechanism

Make sure to watch the following three videos "in order" so that you wouldn't get overwhelmed immediately.

Note: to enable the network "learn" while doing attention, input is projected to a lower dimension using $H$ different linear projections (heads). $H$ different attentions will be learned from those H heads and added together to yield a final result.

Last updated 2 years ago