Skip to main content

Viola-Jones & AdaBoost overview


Here is a university project I particularly enjoyed working on and I hope some ideas here are useful some somebody else.

  •  The task: Computer Vision project with something to do with Face Detection.
  •  The project: Simplified Viola-Jones implementation (in MATLAB), from the ground up.

Following is an overview. I won’t publish any code; you’d miss out on all the fun!

First, reading and sources:

Of course, the original paper: Robust Real-Time Face Detection. Viola P.; Jones M.J. (2001)

And the material from CSE 455: Computer Vision Shapiro L. (U. Washington) (2017): Image datasets, more down to earth theory and implementation suggestions.

Quick overview:

Some “features” made up of rectangles are generated randomly within some bounds. One can compute the value of each feature for a particular image by adding pixel values under white rectangles and subtracting those under black rectangles. What’s called an Integral Image speeds ups computations significantly.

The name of the game is identifying which features best tell faces apart from non-faces. This is the result of training over an image dataset.

Training: Overview:

A schematic of the training process is shown below. Thousands (or tens of thousands) of images are processed, computing a number of features on them. Then, this data together with labels indicating which images are faces and which are not, is passed to the AdaBoost learning algorithm, that chooses the best ones and assigns a weight to them as a function of their error.

The best classifiers are saved for later use in detection.

Training: AdaBoost:

The following diagrams describe the workings of AdaBoost. Four images of faces and four backgrounds are used as training data in this example, and we are calculating two features only.

The colored discs hold the feature value of each image, blue means face and red “background”. As seen on the diagram, features are ordered by numeric value, keeping track of what values relate to faces and which don’t. Disk size represents the “weight” of each image for error calculation. At the start all image weights are equal. We’ll come back to this later.

Once the features are ordered, the first iteration begins:

  • First identify the best threshold (i.e. minimum error) that separates both categories, as well as a polarity (i.e. faces are bigger than or lesser than the threshold), for each feature. 
  • Then choose the best feature for the current iteration, the one with smallest error.

  • We may now proceed to turn this feature into a weak classifier by assigning it a weight, bigger the better it is. In case of perfect classification (zero error), quite unlikely with a big dataset, weight is limited as it would be infinite.
  •  Finally, image weights are adjusted so that images that were incorrectly classified have a higher weight.

A next iteration commences, with the updated weight, and the process continues util the desired number of classifiers is reached.

The final result is the strong classifier, shown below. Its threshold can be varied, as a tradeof between sensitivity and false positive rate.

Detection:

Finally, the detection process is outlined below. It is pretty straightforward after the training is done. The image is scanned, and each window evaluated. Those that are classified as faces are framed in red. Then comes non-maximum suppression, which consolidates multiple detections of the same face into one. Example images from here and here.

Note that this implementation has a single layer. One of the core ideas of Viola-Jones is a cascaded detector, in which many such layers are cascaded, tweaking the strong classifier threshold so that windows not likely to be faces get promptly discarded, while others progress down the cascade, and those that make it through all layers are then classified as faces. This gif is pretty illustrative 

Finally, some numbers, how good is it?

Not bad, but it is pretty slow. The single layer approach and MATLAB are to blame here, the images above taking about 60 s each.

Subtleties:

There are many, many things we haven’t looked at: Feature calculation, image normalization, data structures… as always, the devil is in the details.

Comments

Popular posts from this blog

Split-ring compound epicyclic/planetary gearboxes

A while ago, I came across this strange thing called ‘split-ring compound epicyclic/planetary gearboxes’. They seemed really nice, extremely high gear ratios in compact, stackable modules. But the already existing models were not enough. I wanted to be able to design my own, and due to the lack of information on the subject, I had to do a little research and some math. Here is most of what I would have liked to find on the first place: 1. What is a planetary gearbox? Planetary gearboxes, as their name says, resemble planets orbiting around a “sun”. They are composed of a sun gear, in the center, two or more planet gears around it -and often fitted to a carrier- and a ring/annulus gear on the outside. As an image is worth a thousand words: Source Their main feature is a high reduction ratio in a small, flat space, and also, it is easy to couple the output of one gearbox to the input of another one, getting a two or more stage gearbox with such a high reduction ratio. But where is the i...

BFO Metal detector

Just a simple BFO (beat frequency oscilator) metal detector based on  this one . It is comprised by two LC oscillators, a fixed or reference one and a search one. The coil of the search oscillator is located in close proximity to the ground. When a metal object is close, it changes the inductance of the coil, changing the frequency the search oscillator works at. In this case, both oscillators work around 200 kHz. Both signals are added together, and that results in an AM signal, which can be demodulated. The audio tone resultant happens to be the difference of the two frequencies. This way, changes in frequency of the search oscillator can be noticed as the sound frequency changes. Testing and modification of the original design. Notice that AM signal and its demodulation can be seen on the scope: The frequency adjustment was modified and an audio amplifier was added. Here is the schematic for my version. Capacitor values are marked as they appear on the physical com...

FlatCAM 8.994 Beta Tips

I'll be adding here some tips and how-tos on how on FlatCAM 8.994, an Open-Source PCB CAM tool. This is a work in progress and it's mostly for my own reference. I'll be updating it as I discover new stuff. Note that procedures shown may not be correct or the most efficient. This is provided "as is". FlatCAM's website:  http://flatcam.org/ Installation:  https://bitbucket.org/jpcgt/flatcam/downloads/ Create and save new project: Load Gerbers & Excellon: Single-sided isolation routing: On the left pane, double-click your front copper gerber, and you'll be taken to a Properties window. Choose Isolation routing. If the left pane somehow disappears, you may go to View > Toggle project/Properties/Tool. On tools & tools library: From within the isolation window, delete default tools and choose the tool you want to use, or create a new one. See the screenshow on how to do so. Here is how to configure a traditional V bit. To be continued... Soon (TM) Don...