Face Attributes Extraction using Deep Learning

One of the most important applications in computer vision is in the processing of facial data. Face detection and recognition are being widely used in many applications, to know more about SEEEN face recognition you can refer to this post.

In this article, we are going to talk about three of the most important applications of face-based classification algorithms:

Age detection
Gender detection
Ethnicity detection
Emotions or facial-expression detection

Age, gender, ethnicity and emotions are key-facial attributes, they play fundamental roles in social interactions and video analysis, making their estimation an important task in computer vision intelligent applications, such as video analysis, access control, human-computer interactions, intelligent marketing and visual surveillance, etc.

SEEEN Facial Analysis Pipeline

Typical facial analysis system can be divided into two main steps displayed in the figure below.

Face detection: detecting all faces in an image.
Face analysis: classifying each face based on its attributes.

Generic Face Processing Pipeline

There are various ways to implement each of the steps in a face analysis pipeline. In our system, we perform facial detection using MTCNN, then we apply three convolutional neural networks (CNN) models in parallel to classify a face into the different attributes mentioned earlier (i.e. age, gender, ethnicity and emotions), as displayed in the next figure.

SEEEN’s Face Analysis Pipeline

The first step is to detect faces in the input image, extract all detected faces, do some post-processing on the detected faces (resize and crop), and send them to face classifiers that works in parallel, every face classifier model will assign one of its categories to each face, finally, the results are aggregated to form the final output.

In videos, every detected face is tracked and assigned a track_id that helps to identify the same person over a sequence of consequent frames.

Face detection

The Multi-Task Cascaded Convolutional Neural Networks (MTCNN) is a neural network, which detects faces and facial landmarks (five key points of a face) on images. It is one of the most popular face detection tools today.

In SEEEN we feed video frames to MTCNN and obtain bounding boxes of each face in the frame along with its facial landmarks. These information are then used to extract (crop) each face image and feed it to the subsequent face classification algorithms.

MTCNN Detection Result Example

Face Analysis

In our approach, the analysis step is done using three classification convolutional neural networks that run in parallel on the detected faces from the previous step:

Age and gender classification

We have trained a CNN model that predicts the age and gender from an input human face image. The predicted Age is an integer between 0-100 and the gender categories “male and female”.

2. Ethnicity classification

A CNN model that predicts the ethnicity from an input human face image. The faces are categorized into five categories: White, Black, Asian, Indian and Hispanic. We assign the ethnicity of the highest prediction score to the image.

3. Emotion classifier

A CNN model that predicts the emotion or facial-expression from an input human face image. The faces are categorized into six categories: Angry, Happy, Neutral, Sad, surprised and Other.

Technical specifications

SEEEN’s facial analysis pipeline is implemented with Pytorch, Keras, tensorFlow and OpenCV.

The code can run on one or multiple CPUs or GPUs. Tested on Nvidia one GTX 2080TI GPU and it runs near to real-time (i.e. 1 minute video == 1 minute processing).

‍