Computer Vision
  • Introduction
    • Neural network basics
    • MLE and Cross Entropy
    • Convolution basics
    • Neural Network Categories
  • 2D Backbones
    • ResNet
    • Transformer
      • Recurrent Neural Network
      • Vision Transformer
      • SwinTransformer
  • Methods for Object Detection
  • Object Detection
    • The R-CNN family
    • ROI pool & ROI align
    • FCOS
    • Object Detection in Detectron2
  • Segmentation
    • Fully Convolutional Network
    • Unet: image segmentation
  • Video Understanding
    • I3D: video understanding
    • Slowfast: video recognition
    • ActionFormer: temporal action localization
  • Generative models
    • Autoregressive model
    • Variational Auto-Encoder
    • Generative Adversarial Network
    • Diffusion Models
    • 3D Face Reconstruction
Powered by GitBook
On this page
  1. 2D Backbones

Transformer

Attention is all you need!

PreviousResNetNextRecurrent Neural Network

Last updated 1 year ago

The 3 most important keys to understanding transformer are:

  1. Position Encoding: allows parallel training

2. Encoder Decoder mechanism

3. (Multi-head) (Self) Attention Mechanism

Make sure to watch the following three videos "in order" so that you wouldn't get overwhelmed immediately.

  • Note: to enable the network "learn" while doing attention, input is projected to a lower dimension using HHH different linear projections (heads).HHH different attentions will be learned from those H heads and added together to yield a final result.