An Investigation into Explanations for Convolutional Neural Networks

Abstract

As deep learning techniques have become more prevalent in computer vision, the need to explain these so called ‘black boxes’ has increased. Indeed, these techniques are now being developed and deployed in such sensitive areas as medical imaging, autonomous vehicles, and security applications. Being able to create reliable explanations of their operations is therefore essential. For images, a common method for explaining the predictions of a convolutional neural network is to highlight the regions of an input image that are deemed important. Many techniques have been proposed, however these are often constrained to produce an explanation with a certain level of coarseness. Explanations can be created that either score individual pixels, or score large regions of an image as a whole. It is difficult to create an explanation with a level of coarseness that falls in between these two. A potentially even greater problem is that none of these explanation techniques have been designed to explain what happens when a network fails to obtain the correct prediction. In these instances, current explanation techniques are not useful. In this thesis, we propose two novel techniques that are able to efficiently create explanations that are neither too fine or too coarse. The first of these techniques uses superpixels weighted with gradients to create explanations of any desirable coarseness (within computational constraints). We show that we are able to produce explanations in an efficient way that have a higher accuracy than comparable existing methods. In addition, we find that our technique can be used in conjunction with existing techniques such as LIME to improve their accuracy. This is subsequently shown to generalise well for use in networks that use video as an input. The second of these techniques is to create multiple explanations using a rescaled input image to allow for finer features to be found. We show this performs much better than comparable techniques in both accuracy and weak-localisation metrics. With this technique, we also show that a common metric, faithfulness, is a flawed metric, and recommend its use be discontinued. Finally, we propose a third novel technique to address the issue of explaining failure using the concepts of surprise and expectation. By building an understanding of how a model has learnt to represent the training data, we can begin to explore the reasons for failure. Using this technique, we show that we can highlight regions in the image that have caused failure, explore features that may be missing from a misclassified image, and provide an insightful method to explore an unseen portion of a dataset.

Publication
PhD Thesis, Cardiff University