The R-CNN family

R-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN

Traditional Approach

  • Region Proposal Generator: Selective Search , Edge Boxes

  • Image descriptors: histogram of oriented gradients (HOG)

  • Classifier: Support Vector Machine

R-CNN

  1. Extract features using a CNN network

Fast R-CNN

Faster R-CNN builds a network that has only a single stage:

  1. Input image is fed into a pretrained CNN network to get a feature map

  2. Input image is also used to propose nn region proposals

  3. Project the region proposals on the feature map yields nn region of interest (ROI)

  4. ROI pooling layers are used to extract a fixed-length feature vector from those ROIs

Fast R-CNN is faster than R-CNN because:

  1. Faster R-CNN shares computations (i.e. convolutional layer calculations) across all proposals (i.e. ROIs) rather than doing the calculations for each proposal independently.

  2. Fast R-CNN does not cache the extracted features and thus does not need so much disk storage compared to R-CNN, which needs hundreds of gigabytes.

Faster R-CNN

  1. Changes the selective search portion in Fast R-CNN into a Regional Proposal Network.

Mask R-CNN

  1. Replaces the region of interest pooling layer with the region of interest (RoI) alignment layer

  2. More suitable for pixel-level prediction (semantic segmentation)

Resources

Last updated