PERICLES Concept Detection and SALIC
In this blog post we introduce PERICLES Concept Detection (PeriCoDe), a proof of concept prototype which facilitates image retrieval from large repositories by using the SALIC (Social Active Learning for Image Classification) approach to automatically train concept detectors without requiring significant annotation efforts. This prototype has been developed by the Multimedia Knowledge and Social Media Analytics Laboratory (MKLab) of CERTH/ITI within the PERICLES Work Package (WP) 4: Capturing content semantics and environment.
Machine Learning - Supervised
Machine learning simulates the way humans learn to recognize objects. A model is trained to recognize a concept by providing a set of positive and negative examples (i.e. training set). For example, in the case of visual object classification, in order to train a model for a duck, we need to provide a set of images that depict ducks (positive examples) and a set of images that do not (negative examples). In this way, the machine learning algorithm learns the characteristics that differentiate ducks from other visual objects. The performance of a machine learning model mainly depends on the quality and the quantity of the training set. Quality is accomplished through manual annotation (i.e. dedicated experts are asked to annotate a set of images with respect to their content), which is a laborious and time consuming task. This has a direct impact on the second condition, i.e. quantity.
Active Learning – What is it and why do we need it?
In training a classifier to recognize ducks, we first need to gather a set of positive and a set of negative examples. While there is a huge pool of unlabeled images on the web (1.8 billion images are uploaded daily ), it is practically impossible to include them all in the training set for two main reasons. First, the training of the classifier would be practically impossible for such high amounts of data due to computational complexity. Second, even if we could surpass the technical issues, manual annotation at this scale is prohibitively expensive and laborious. Essentially though, the question is do we need all these images to sufficiently describe a duck? In the case of negative examples, if one considers that many of them depict similar objects, e.g. there are many images depicting the sky, do we really need thousands of images depicting skies to define what a duck is not? Considering this and in an effort to minimize the labeling cost, active learning attempts to identify only the useful part of the data that will provide significant information to the classification model so that it can learn better representations of the desired object (duck in this case). To do this, the first step in active learning is to initially gather a small set of labeled examples, selected randomly (typically from 1 to a few hundred for each class). Then an initial classification model is trained. Based on this model, the algorithm actively selects examples to be labeled from a large pool of unlabeled examples (also known as pool of candidates), based on their informativeness. There are many methods to calculate this informativeness measure for each unlabeled example. One of the most popular ones is based on the uncertainty assumption; i.e. it would be more beneficial to the classification model if we knew the labels of the examples that it is mostly uncertain for their content. After the informative examples are selected, they are given to an oracle (typically a dedicated expert) to annotate them as positive or negative and they are added in the training set. Active learning is an iterative process, adding a few examples in each iteration and retraining the classifier each time.
SALIC – What is it?
SALIC (Social Active Learning for Image Classification) is an algorithm based on active learning that attempts to omit completely the process of manual annotation (i.e. the human oracle in the case of typical active learning). This is achieved by replacing the pool of unlabeled images with user tagged images (i.e. Flickr images). In this way, we can use the tags of the images to indicate the actual content of the images. The simple way to go, e.g. for the case of duck, would be to annotate all images containing the keyword duck as positive and all the rest as negative. However, this entails two significant problems; first, the amount of data gathered in this way is huge, which makes training the classifier impractical due to the computational complexity of most machine learning algorithms. In addition, social tagging is known to be quite noisy [ref], meaning that by taking the user tags as labels would create a noisy dataset with many false positives and negatives. In order to alleviate this problem, SALIC uses the more sophisticated bag-of-words model that takes into account the context of the tags and in this way assigns a probability (confidence score) to each image indicating if it contains a duck or not. Then, this confidence score is combined with the informativeness of the image with a probabilistic fusion approach. The new combined probability indicates which images are both informative for the classifier and can be correctly annotated by the user tags. Finally, the top N images are selected (N for the positive class and N for the negative class) and are added in the training set. The classifier is retrained and the process is repeated iteratively.
SALIC is PERICLES proposal to train concept detectors for multiple concepts in a scalable way, by both alleviating the problem of manual annotation typically required to train such models, as well as minimizing the required training set and in this way decreasing the computational complexity of training the models. Since SALIC is a general scientific method it can be applied in many different contexts depending on the type of data that are used to train the classifiers.
A paper on SALIC has been published in the Journal IEEE Transactions on Multimedia.
Who is it for?
The main target audience of SALIC is researchers in the multimedia analysis domain. However, it can be useful to anyone who wants to train a concept detector that can identify any concept based on visual content. It is particularly useful, when we do not have sufficient manually labeled images for our concepts. The code is available online along with documentation and an installation guide and is distributed freely under the Apache 2.0 license.
PeriCoDe is the application of the general scientific method (SALIC) in the art domain, which is one of the two case studies related to the PERICLES project. It is essentially the visual concept detection tool of PERICLES. Visual concept detection, as mentioned above, is accomplished through applying machine learning models on images (e.g. paintings) that have been trained to detect specific concepts. PeriCoDe can facilitate image retrieval in large collections as it aims to relieve the end-user from the burden of manually searching images based on their content by using a set of visual algorithms which have been trained to detect similarities.
Find out more about the theory behind PeriCoDe in this deliverable.