The last decade has witnessed a revolution in deep learning methods taking advantage of the computing power of GPUs and the abundance of quality annotated data, challenging problems like image classification and object detection can be solved easily nowadays with high accuracy. While the deep learning methods are rapidly improving and showing effectiveness in computer vision tasks, curating and updating the datasets for such tasks is still challenging in some application areas.
A particular case of the aforementioned challenge is using deep learning models to perform image recognition. One of the most popular architectures of recognition system comprises two steps:
- Transform a dataset of labelled images into distinct numerical representations called embeddings.
- Extract the embeddings from new unseen unlabelled images, and measure the similarity between the new embeddings and those of the labelled dataset, in order to assign labels to the new images.
One optional preprocessing step that can be applied before the first step above is object detection. This step is required when performing object recognition, like faces, logos, car models, etc. In order to recognise objects, their candidate bounding boxes need to be first detected. Domain specific detectors can be used to extract these bounding boxes and feed them as images to the following steps in the processing pipeline.
In a previous article, we have demonstrated SEEEN’s face recognition system that follows the aforementioned steps (including a face detection algorithm). In this article we will demonstrate the SEEEN platform for efficient image annotation to build recognition systems.
In order to feed a recognition system with the set of objects to recognise, an annotation tool is required to allow the fast and accurate curation and annotation. SEEEN has adopted and customized PixPlot for that purpose.
PixPlot (dhlab.yale.edu/projects/pixplot) is a visualization tool of high dimensional data. It uses dimensionality reduction tools like TNSE or UMAP to project high dimensional vectors into 2D or 3D spaces while preserving as much as possible the spatial relation between these vectors.
Before a recognition system can start recognising images, we need to collect examples and compute their embeddings. These embeddings are then inserted into a dataset for later recognition.
Building such a dataset can be time consuming and prone to errors especially when dealing with large collections of data. To facilitate this process we decided to rely on PixPlot to handle the construction of the dataset by annotating a bulk of images at once while giving the user the ability to correct the errors.
PixPlot then processes these high dimensional embeddings and projects them onto a 2D space as illustrated in the image below (left). We can see a pattern in the projected data where some points are grouped together in a cluster. Ideally each cluster would represent the same instance of the object (e.g. the faces of the same person).
Pixplot can use one of the commun clustering methods like KNN or DBSCANE to compute these clusters. The image below (right) shows an example of a cluster containing automatically grouped faces.
SEEEN’s Customisation of PixPlot
The PixPlot ability to group similar images has inspired us to use it as an annotation assistant to help labeling large amounts of data. In SEEEN we have customised PixPlot and used it in two tasks:
- To visualize clusters of images in order to validate the effectiveness of our embedding extraction algorithm. Good embedding algorithm should produce clean clusters (i.e. each cluster contains images belonging to the same category or instance), however, if the clusters contain random, or data of mixed categories, then we consider the embedding to be bad.
- To clean and label clusters. As the clusters are detected automatically based on the embedding algorithm, some clustering errors might happen even when using good algorithms. We customised PixPlot to allow the quick selection of multiple images in a cluster in order to clean it by excluding images judged to be irrelevant. After cleaning the cluster, it is very easy to assign a label to it, the label represents the category or instance of object that the cluster represents. Example: cluster X can be labeled as “Person Y” because it contains instances of faces of that person. The images and labels are then uploaded to a cloud storage (which is a new feature implemented by us).
Our customisation of PixPlot makes our annotation process effective and efficient as it helps reviewing and annotating clusters of images rather than individual ones. The next figure shows an example on how efficient is to visualise a cluster of images, clean it and then assign a label to it.
The efficient and effective image annotation is a crucial task for building visual recognition systems. In SEEEN we have chosen to adopt and customise PixPlot to allow reviewing the effectiveness of our embedding algorithms and to annotate and clear clusters of images in an efficient manner.