Computer Vision
  • Introduction
    • Neural network basics
    • MLE and Cross Entropy
    • Convolution basics
    • Neural Network Categories
  • 2D Backbones
    • ResNet
    • Transformer
      • Recurrent Neural Network
      • Vision Transformer
      • SwinTransformer
  • Methods for Object Detection
  • Object Detection
    • The R-CNN family
    • ROI pool & ROI align
    • FCOS
    • Object Detection in Detectron2
  • Segmentation
    • Fully Convolutional Network
    • Unet: image segmentation
  • Video Understanding
    • I3D: video understanding
    • Slowfast: video recognition
    • ActionFormer: temporal action localization
  • Generative models
    • Autoregressive model
    • Variational Auto-Encoder
    • Generative Adversarial Network
    • Diffusion Models
    • 3D Face Reconstruction
Powered by GitBook
On this page
  • Traditional Approach
  • R-CNN
  • Fast R-CNN
  • Faster R-CNN
  • Mask R-CNN
  • Resources
  1. Object Detection

The R-CNN family

R-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN

PreviousObject DetectionNextROI pool & ROI align

Last updated 2 years ago

Traditional Approach

  • Region Proposal Generator: Selective Search , Edge Boxes

  • Image descriptors: histogram of oriented gradients (HOG)

  • Classifier: Support Vector Machine

R-CNN

  1. Extract features using a CNN network

Fast R-CNN

Faster R-CNN builds a network that has only a single stage:

  1. Input image is fed into a pretrained CNN network to get a feature map

  2. Input image is also used to propose nnn region proposals

  3. Project the region proposals on the feature map yields nnn region of interest (ROI)

  4. ROI pooling layers are used to extract a fixed-length feature vector from those ROIs

Fast R-CNN is faster than R-CNN because:

  1. Faster R-CNN shares computations (i.e. convolutional layer calculations) across all proposals (i.e. ROIs) rather than doing the calculations for each proposal independently.

  2. Fast R-CNN does not cache the extracted features and thus does not need so much disk storage compared to R-CNN, which needs hundreds of gigabytes.

Faster R-CNN

  1. Changes the selective search portion in Fast R-CNN into a Regional Proposal Network.

Mask R-CNN

  1. Replaces the region of interest pooling layer with the region of interest (RoI) alignment layer

  2. More suitable for pixel-level prediction (semantic segmentation)

Resources

14.8. Region-based CNNs (R-CNNs) — Dive into Deep Learning 1.0.0-beta0 documentation
Faster R-CNN Explained for Object Detection Tasks | Paperspace BlogPaperspace Blog
Logo
Logo