# The R-CNN family

## Traditional Approach

<figure><img src="https://1026108543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfMiBkUCO0YX35Q3Joj8n%2Fuploads%2FG5UDE9ShcYv9PATcAP3z%2Fimage.png?alt=media&#x26;token=a42c659d-ec54-40d8-ad39-5d7ad6b5cded" alt=""><figcaption></figcaption></figure>

* Region Proposal Generator:  Selective Search , Edge Boxes
* Image descriptors:  histogram of oriented gradients (HOG)
* Classifier: Support Vector Machine

## R-CNN

<figure><img src="https://1026108543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfMiBkUCO0YX35Q3Joj8n%2Fuploads%2FVvs98Qwe8iIprh8WtFHK%2Fimage.png?alt=media&#x26;token=996fa62d-3eb7-4d6d-b6fc-2c7b2c836807" alt=""><figcaption></figcaption></figure>

1. Extract features using a CNN network

## Fast R-CNN

<figure><img src="https://1026108543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfMiBkUCO0YX35Q3Joj8n%2Fuploads%2Fz2MEpHOn7cmnj00nWUib%2Fimage.png?alt=media&#x26;token=37ef6a38-16ee-4d0f-8d33-c315bf4563d8" alt=""><figcaption></figcaption></figure>

Faster R-CNN builds a network that has only a single stage:

1. Input image is fed into a pretrained CNN network to get a feature map
2. Input image is also used to propose $$n$$ region proposals
3. Project the region proposals on the feature map yields $$n$$ region of interest (ROI)
4. ROI pooling layers are used to extract a fixed-length feature vector from those ROIs

Fast R-CNN is faster than R-CNN because:

1. Faster R-CNN shares computations (i.e. convolutional layer calculations) across all proposals (i.e. ROIs) rather than doing the calculations for each proposal independently.
2. Fast R-CNN does not cache the extracted features and thus does not need so much disk storage compared to R-CNN, which needs hundreds of gigabytes.

## Faster R-CNN

<figure><img src="https://1026108543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfMiBkUCO0YX35Q3Joj8n%2Fuploads%2FO7mRIrwdrov8xHG4IjFP%2Fimage.png?alt=media&#x26;token=7bc86928-bb1d-46ec-af0f-744a20f2b999" alt=""><figcaption></figcaption></figure>

1. Changes the selective search portion in Fast R-CNN into a Regional Proposal Network.&#x20;

## Mask R-CNN

<figure><img src="https://1026108543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfMiBkUCO0YX35Q3Joj8n%2Fuploads%2FouihsJbhUaukiCXUlkmh%2Fimage.png?alt=media&#x26;token=b54e937a-c58d-4efc-9f89-3cb4b536b334" alt=""><figcaption></figcaption></figure>

1. Replaces the region of interest pooling layer with the *region of interest (RoI) alignment* layer
2. More suitable for pixel-level prediction (semantic segmentation)

## Resources

{% embed url="<https://d2l.ai/chapter_computer-vision/rcnn.html>" %}

{% embed url="<https://blog.paperspace.com/faster-r-cnn-explained-object-detection/>" %}

{% embed url="<https://www.youtube.com/watch?v=vr5rs_cTKCs>" %}
