Introduction

The field of Artificial intelligence (AI) has gone through cycles of excitement and disillusionment resulting from breakthroughs being made, which then failed to deliver to the extent desired. Today, AI has begun to permeate all aspects of our digital life. This is largely due to the advent of machine learning (ML), which leverages the availability of vast amounts of data to ‘teach’ complex models to perform specialized tasks. Fields such as finance, medicine, retail, transport and even law are looking to leverage the insights that can be distilled from data by ML models to improve performance and efficiency. Meanwhile, at an individual level, ML automatically categorizes our photos, predicts the next word we are going to type and provides recommendations while we read and shop. There is thus tremendous excitement about the potential of AI to fundamentally alter our lives, hopefully for the better.

These applications of ML are possible due to the development of algorithms and models that have achieved impressive, even ‘human-level’, performance in domains such as image recognition, natural language and speech processing and game-playing. For tasks such as online recommendations and spam filtering , ML is the new norm de rigueur. Paradigms such as supervised, unsupervised, online and reinforcement learning have emerged or been rediscovered to model and solve problems in different domains. Of these, perhaps the most popular and ubiquitous is supervised learning, where labeled data is used to train ML models to recognize patterns, with the aim that the learned patterns are general enough to allow the model to provide labels for new data. This is the dominant paradigm for image recognition and the one that we will be focusing on this thesis. In many cases, however, there may be a lack of labeled data, necessitating the use of unsupervised learning, or the learning algorithm may need to continuously interact with a changing environment to obtain data samples, as in online and reinforcement learning. One of the main drivers behind the success of all of these learning paradigms is the use of deep learning for extracting relevant features from data, a process known as feature extraction.
Statistical learning theory provides the tools to analyze supervised machine learning , primarily through the paradigm of Probably Approximately Correct (PAC)-learning and learning algorithms such as gradient descent. There are two main quantities of interest in statistical learning theory. First, given a particular family of classifiers (such as neural networks or linear classifiers), the minimum error of any classifier belonging to that family for a given distribution. The second is the number of samples from that distribution that are needed to learn a classifier from within the family that achieves an error close to the minimum error. This quantity is typically known as the sample complexity. Together, these quantities characterize a supervised learning problem completely, since they determine what a good classifier family is, as well as how much data is needed to learn a classifier with close to the best possible performance from within that family. In standard statistical learning theory, the sample complexity of learning is determined by a metric known as the VapnikChervonenkis (VC)-dimension that measures the expressiveness of the classifier family. In general, the more expressive the classifier family, the more samples that are needed to learn a good classifier from it.

 

 

Adversarial Machine Learning

These successes, however, have led to major concerns about the fairness, privacy and security of ML systems. Privacy concerns are paramount at the data collection and deployment stages. The undesirable leakage of private information can occur, either about the individuals whose data was collected or about the model itself, which may be proprietary. There are also numerous ethical concerns about the fairness of ML systems, especially when they are deployed in critical decision-making tasks that can have a disproportionate negative impact on minorities. Finally, the integration of ML into new and existing systems has exposed a new attack surface, one that is not reliant on hardware or software, but on the underlying algorithms and models themselves. Attackers can leverage their properties to interfere with either the training or inference phases, leading to undesirable or unexpected outcomes from the ML models. These security violations can also lead to fairness and privacy concerns.

Attacks during the training phase are known as poisoning attacks , while those during the test phase are called evasion attacks. Evasion attacks have proved to be a particular source of concern as systems are expected to be robust to small changes in the input. However, it has been discovered that both simple and complex ML models drastically change their output in response to imperceptibly modified inputs known as adversarial examples. Subsequent research then uncovered the presence of adversarial-example based evasion attacks for a wide range of supervised learning tasks such as image classification , object detection , image segmentation and speech recognition. Attacks have been found to be feasible even for other learning paradigms such as generative modeling and reinforcement learning. While these attacks highlight the perniciousness and ubiquity of adversarial examples across all facets of machine learning, they do not establish if they are a serious threat for real-world systems.

Real-world ML systems present two additional hurdles for an attacker to overcome before carrying out a successful attack. The first hurdle is that some of these systems are likely to process physical manifestations of objects before carrying out inference. This is particularly true of computer vision systems in applications such as self-driving cars and optical character recognition systems. For such systems, it is not enough to demonstrate that the underlying ML model changes its outputs when presented with adversarial examples since the perturbations added to generate these must be robust to changes in environmental conditions and processing pipelines. These requirements led to a line of research on physical adversarial examples , which are generated while accounting for these additional constraints. The second hurdle is that for a real-world ML system, the attacker is unlikely to have knowledge of the internals of the model being attacked, which is required for the generation of adversarial examples. Most ML models are proprietary and if an attacker manages to gain access to the ML model via software and/or hardware vulnerabilities, then the threat of adversarial examples will no longer remain the most pertinent one.

To overcome this hurdle, black-box evasion attacks were proposed. The first instantiations of these relied on the observation that adversarial examples were likely to transfer from one model to another if the models were similar enough. However, since the similarity of models relies on a number of factors such as the architecture, choice of hyperparameters and the data used for training, transferability-based attacks have low effectiveness against deployed ML systems which allow inputs to be submitted for inference [Clarifai, Google Vision API, Watson Visual Recognition].

To defend against evasion attacks, a wide variety of defenses have been proposed. One line of defenses focuses on using ensembling procedures to improve robustness while another, perhaps the most popular, is based on altering the training procedure to account for the presence of adversarial examples during training, and in particular, iterative adversarial training. The latter, however, is reliant on a careful specification of the adversary’s threat model in order to be effective. A further drawback of these empirical defenses is that they are often vulnerable to adaptive attacks, leading to an arms race between attacks and defenses. To circumvent this, several defenses have been proposed that are provably robust given a specification of the adversary’s capabilities. Unfortunately, provable robustness has only been shown for adversaries with very limited capabilities, making the utility of such defenses suspect for a real-world deployment.