Skip to main content

Viola-Jones & AdaBoost overview


Here is a university project I particularly enjoyed working on and I hope some ideas here are useful some somebody else.

  •  The task: Computer Vision project with something to do with Face Detection.
  •  The project: Simplified Viola-Jones implementation (in MATLAB), from the ground up.

Following is an overview. I won’t publish any code; you’d miss out on all the fun!

First, reading and sources:

Of course, the original paper: Robust Real-Time Face Detection. Viola P.; Jones M.J. (2001)

And the material from CSE 455: Computer Vision Shapiro L. (U. Washington) (2017): Image datasets, more down to earth theory and implementation suggestions.

Quick overview:

Some “features” made up of rectangles are generated randomly within some bounds. One can compute the value of each feature for a particular image by adding pixel values under white rectangles and subtracting those under black rectangles. What’s called an Integral Image speeds ups computations significantly.

The name of the game is identifying which features best tell faces apart from non-faces. This is the result of training over an image dataset.

Training: Overview:

A schematic of the training process is shown below. Thousands (or tens of thousands) of images are processed, computing a number of features on them. Then, this data together with labels indicating which images are faces and which are not, is passed to the AdaBoost learning algorithm, that chooses the best ones and assigns a weight to them as a function of their error.

The best classifiers are saved for later use in detection.

Training: AdaBoost:

The following diagrams describe the workings of AdaBoost. Four images of faces and four backgrounds are used as training data in this example, and we are calculating two features only.

The colored discs hold the feature value of each image, blue means face and red “background”. As seen on the diagram, features are ordered by numeric value, keeping track of what values relate to faces and which don’t. Disk size represents the “weight” of each image for error calculation. At the start all image weights are equal. We’ll come back to this later.

Once the features are ordered, the first iteration begins:

  • First identify the best threshold (i.e. minimum error) that separates both categories, as well as a polarity (i.e. faces are bigger than or lesser than the threshold), for each feature. 
  • Then choose the best feature for the current iteration, the one with smallest error.

  • We may now proceed to turn this feature into a weak classifier by assigning it a weight, bigger the better it is. In case of perfect classification (zero error), quite unlikely with a big dataset, weight is limited as it would be infinite.
  •  Finally, image weights are adjusted so that images that were incorrectly classified have a higher weight.

A next iteration commences, with the updated weight, and the process continues util the desired number of classifiers is reached.

The final result is the strong classifier, shown below. Its threshold can be varied, as a tradeof between sensitivity and false positive rate.

Detection:

Finally, the detection process is outlined below. It is pretty straightforward after the training is done. The image is scanned, and each window evaluated. Those that are classified as faces are framed in red. Then comes non-maximum suppression, which consolidates multiple detections of the same face into one. Example images from here and here.

Note that this implementation has a single layer. One of the core ideas of Viola-Jones is a cascaded detector, in which many such layers are cascaded, tweaking the strong classifier threshold so that windows not likely to be faces get promptly discarded, while others progress down the cascade, and those that make it through all layers are then classified as faces. This gif is pretty illustrative 

Finally, some numbers, how good is it?

Not bad, but it is pretty slow. The single layer approach and MATLAB are to blame here, the images above taking about 60 s each.

Subtleties:

There are many, many things we haven’t looked at: Feature calculation, image normalization, data structures… as always, the devil is in the details.

Comments

Popular posts from this blog

Split-ring compound epicyclic/planetary gearboxes

A while ago, I came across this strange thing called ‘split-ring compound epicyclic/planetary gearboxes’. They seemed really nice, extremely high gear ratios in compact, stackable modules. But the already existing models were not enough. I wanted to be able to design my own, and due to the lack of information on the subject, I had to do a little research and some math. Here is most of what I would have liked to find on the first place: 1. What is a planetary gearbox? Planetary gearboxes, as their name says, resemble planets orbiting around a “sun”. They are composed of a sun gear, in the center, two or more planet gears around it -and often fitted to a carrier- and a ring/annulus gear on the outside. As an image is worth a thousand words: Source Their main feature is a high reduction ratio in a small, flat space, and also, it is easy to couple the output of one gearbox to the input of another one, getting a two or more stage gearbox with such a high reduction ratio. But where is the i...

Arduino Based Electronic Load

I have had some problems with my "lab power supply" and I wanted to build another one, so I thought a DC load may be handy to have around. The design is based around Dave Jone's design shown in this video , but with a couple more features, including:     -Arduino controlled.     -Voltage, current, power and temperature monitoring.     -CC and external in hardware modes and CP, Cr software modes.     -Over temperature, over power and over current protection (software) It can handle around 4 amps and 24 volts, limited by the mosfet. It's divided in two main boards. * UI Board: houses the 7 segment displays (I know one is bigger that the others, just what I had aroud...) and the keyboard. The display is multiplexed using a shift register. In addition, the four buttons are read taking advantage of the transistors switching the comon anodes, requiring only one additional pin. Only the first digit had decimal point, so one led was add...

Linear lab power supply

This is a dual channel linear lab power supply I have spent recent months building, much thanks to the help received in  this EEVBlog thread . All design files are available in this  GitHub repository . I will attempt to go over the general progress of the project, but please refer to the said thread for more information. Design requirements: Dual isolated channels. Adjustable voltage and current  30 V , 500 mA per channel. Digital voltage and current displays. Build stages: First, the case was made out of a wooden board and 3d printed front and back panels. Heatsinks were taken from old computers. One side of a 2x 10Vac and a 15 Vac transformer power each channel, plus a 12 Vac smaller transformer for the displays, fan and microcontroller. Ac voltages are rectified and filtered on a separate board. Here are some photos of the early development, initial circuit was based on the one found  here . Then came perfboard prototypes, testing and thr...