Content-Based Image Retrieval


The recent tremendous growth in computer technology has also brought a substantial increase in the storage of digital imagery. Examples of applications can be found in every day life, from museums for archiving images or manuscripts, to medicine where million of images are generated by radiologists every year.

Storage of such image data is relatively straightforward, but accessing and searching image databases is intrinsically harder than their textual counterparts. The goal of Content-Based Image Retrieval (CBIR) systems is to operate on collections of images and, in response to visual queries, extract relevant image. The application potential of CBIR for fast and effective image retrieval is enormous, expanding the use of computer technology to a management tool.

The more realistic approach taken in the early 1990s was to work with simple low level features instead such as the colour histograms used by Swain and Ballard. Since then many more sophisticated methods have been developed. However, due to the difficulties involve most practical approaches are still rooted in low level feature extraction and description.

Further Investigation of Features and Combinations

The first area of research tackled in the project covered alternative low level features that could combine more spatial aspects of the image than colour histograms, but preferably without requiring object identification, or even segmentation.

Hue vs Colour Labels

Colour is the most used feature in CBIR. As an alternative to the standard Hue Saturation Luminosity (HSL) space the colour space was partitioned into Berlin and Kay's 11 ``universal'' categories: i.e. achromatic (Black, White, Gray) and chromatic (Red, Green, Blue, Purple, Orange, Pink and Brown) labels. The coarser quantisation and the perceptual categories benefited retrieval.

Multi-resolution Salience Distance Transform

Shape information was introduced using the multi-resolution salience distance transform applied to edges to generate histograms of distances to edges. This enables histogram matching to respond differently to different types of shapes.
mike1.gif mikeD.gif mikeR.gif
Distance transform of colour region boundaries and of intensity edges

Segmentation by Thresholding

Although automatic general segmentation is difficult and unreliable an approach based on binary thresholding was developed. Even if the segmented regions do not correspond to high-level objects in the scene they can still be useful in injecting spatial information into the histogram description. Specifically, the two image classes (black and white) defined two masks. Histograms were computed separately from each area, and standard histogram based CBIR then applied. To reduce sensitivity a fuzzified version was implemented.

Local vs. Global Statistics

Another approach experimented with histogramming the relation between local statistical image information and the corresponding global image information. Thresholding is applied both globally to the image and locally to the individual windows. Then the percentage difference between the window and the image content, at the relevant position, is histogrammed. Additionally, the amount of blackness found in the window content after thresholding was histogrammed.

Delaunay triangulation

room1.gif room2.gif

Finally, another method to indirectly measure shape started with the edge map. After some tidying up and subsampling the Delaunay triangulation of the edges was carried out. is first carried out, to eliminate spurious short edge lists. The strength of this approach is that connectivity is used to help filter out noise but nevertheless the triangulation is not dependent on connectivity and therefore can cope with edge linking errors. The following properties of the individual triangles were histogrammed and used to describe the image structure: area, aspect ratio, length.

Performance Evaluation / Validation

The second area of interest is performance evaluation. While many new image features and processing methodologies are generated in the area of CBIR, testing those was found to be problematic. Although CBIR is close to the Information Retrieval (IR) field, the complexity of image similarity does not allow immediate application of the IR evaluation / validation techniques to CBIR. The subjective nature of image similarity and the dynamic scope of a query makes objective evaluation of CBIR systems, using simple methods as recall/precision measures, unreliable. Several approaches to evaluation were investigated.

Visualisation of Results

The image similarity measurements generated by histogram comparison of the above features was input to the Pathfinder algorithm, which is a structural modelling technique developed in the field of psychology. This produced a network in which the data typically displayed clustering. The results were then rendered in VRML for visual assessment of the clusters. In addition, the results were compared against a manual clustering of the images.

Visualisation of System Parameters

Various schemes for plotting the contents of feature histograms and image distance matrices while systematically varying internal system parameters were applied, and distinctive behaviours were identified, leading to insights into the effectiveness of individual methods and their combinations.

Statistical Analysis

Rather than just record average recall/precision values the distribution of the values over the queries was investigated. We have found this to be much more revealing in determining the system's performance strengths and limitations. Statistical tests can be applied to check if one algorithm/histogram feature is significantly better than another. Initially the Student's t test was considered, but the distributions were found to be non-normal. Current work applying the non-parametric median test is underway.

Concept Oriented Image Retrieval

The final area of work aims to go beyond histograms and build a higher level image description. The intention is to investigate how a high level knowledge-based concept structure can be used to support high level queries. A set of images has been hand segmented and labelled. This is provided as training data, and using local histograms of the properties local windows in unseen images are labelled using a neural network.

In addition to local properties, pairwise spatial relationships are now used to generate an association graph between the image windows and nodes in a ``concept hierarchy''. Matches are then found by applying a maximal clique graph algorithm. This enables searches to be based not just on spatial relationships but also at different levels in the concept hierarchy.

More details are given in:

return to Paul Rosin's homepage