Computer Vision

CtrlK

The R-CNN family

R-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN

Traditional Approach

Region Proposal Generator: Selective Search , Edge Boxes
Image descriptors: histogram of oriented gradients (HOG)
Classifier: Support Vector Machine

R-CNN

Extract features using a CNN network

Fast R-CNN

Faster R-CNN builds a network that has only a single stage:

Input image is fed into a pretrained CNN network to get a feature map
Input image is also used to propose $n$ region proposals
Project the region proposals on the feature map yields $n$ region of interest (ROI)
ROI pooling layers are used to extract a fixed-length feature vector from those ROIs

Fast R-CNN is faster than R-CNN because:

Faster R-CNN shares computations (i.e. convolutional layer calculations) across all proposals (i.e. ROIs) rather than doing the calculations for each proposal independently.
Fast R-CNN does not cache the extracted features and thus does not need so much disk storage compared to R-CNN, which needs hundreds of gigabytes.

Faster R-CNN

Changes the selective search portion in Fast R-CNN into a Regional Proposal Network.

Mask R-CNN

Replaces the region of interest pooling layer with the region of interest (RoI) alignment layer
More suitable for pixel-level prediction (semantic segmentation)

Resources

14.8. Region-based CNNs (R-CNNs) — Dive into Deep Learning 1.0.0-beta0 documentation

Faster R-CNN Explained for Object Detection Tasks | Paperspace BlogPaperspace Blog

PreviousObject Detection NextROI pool & ROI align

Last updated 2 years ago