In the years ahead we will increasingly see machine learning models move from the digital to the physical world. Whether it’s autonomous vehicles skilfully navigating the chaotic streets of Brussels, or surveillance systems accurately recognising faces in jumbled crowds. For better or worse, AI is moving into our everyday lives.
A very recent example of this can already be found close to home – at Colruyt. The Belgian supermarket is experimenting with a new kind of smart register. Instead of cashiers manually typing codes, the register now automatically tells you which type of fruit or vegetable you’re buying. This system is also designed by a homegrown AI company, Robovision.
The majority of these AI systems will interact with unpredictable physical and digital environments by interpreting images, audio and text. As an example, let’s go back to Colruyt’s smart register. When a customer puts their fruit on the register’s scale, it takes a digital image of the fruit. That image is then processed by a deep neural network which in turn tells you with high confidence whether you’re buying apples or avocados. The smart register is interacting with the physical world through digital images, and these images originate from outside of the register itself.
That being said, a common principle in cybersecurity is to never trust external inputs. It’s the cornerstone of most hacking techniques, as carelessly handled external inputs always introduce the possibility of exploitation. This is equally true for APIs, mobile applications and web applications.
It’s also true for deep neural networks.
Adversarial examples are inputs to machine learning models that attackers have intentionally crafted to cause the models to make mistakes. Often, these alterations are so small as to be unnoticed by humans. For example, what if I could draw lines of a certain shape and colour on mangoes to cause Colruyt’s register to misclassify them as grapes? I guess I could leave Colruyt with some really cheap mangoes.
Of course, this example is exceptionally innocent. It’s hard to image anyone attempting to fool a deep neural network to get away with some cheap fruit.
In contrast, consider the following scenarios.
Facebook currently relies on thousands of content moderators to manually sift through flagged images and videos. It’s an unsurprisingly ugly job, as these moderators are exposed to the worst types of content uploaded on Facebook, from regular obscenities to explicit violence. At this moment the social media giant already relies on AI tools to flag inappropriate content, but the AI tools will increasingly be used to block content from reaching humans altogether.
Adversarial examples provide a way for malicious users to bypass these automatic content filters. Through small alterations, inappropriate content is seemingly unchanged to human observers although neural networks now classify it as being safe. This also goes beyond images and videos to algorithmically generated text meant for large-scale behavioural modification, i.e. “fake news”.
Self-driving vehicles are a prime example of AI systems that interact with unpredictable physical environments by processing images. If we could cause a self-driving vehicle to somehow ignore or misclassify a stop sign, this could lead to dangerous real-life situations. This exact use case has been researched in this paper (1) with distressingly successful results. The authors printed stickers and strategically placed them on stop signs to resemble graffiti. While most humans wouldn’t even notice the stickers because of this resemblance, the neural network they were attacking now misclassified the stop sign as a “Speed limit 45” sign.
Despite the paper’s results, it will be much more difficult to fool self-driving vehicles outside of lab conditions because they base their decisions on multiple modes of input. For example, besides camera input they also employ satellite imagery, sonar or radar, past driving experience from other vehicles and maps containing public road and traffic information. Nevertheless, altered traffic signs remain an interesting attack surface
This group of cyberattacks that target deep neural networks is investigated in the broader field of adversarial machine learning. Besides attacks, the field is also researching ways to make neural networks more robust against adversarial examples, i.e. finding ways to defend against them.
From a cybersecurity perspective it’s obvious why it’s interesting to research the field of adversarial machine learning. To protect the security and integrity of IT systems predicated on machine learning, we need to understand the motivations and capabilities of adversaries. We need to understand why neural networks are susceptible to this type of attack and what can be done to defend against it.
Adversarial machine learning is also interesting from another perspective: to provide us with insights into the often opaque deep learning algorithms. Because deep neural networks are inspired by their biological equivalents, it’s often assumed that they process input in much the same way. Adversarial examples make it glaringly obvious that this is not the case. Examples that would never fool humans can cause neural networks to make high-confidence, but completely incorrect, predictions. Lastly, they illustrate one of the biggest contemporary issues in deep learning: how neural networks fail to generalize well to out-of-distribution data. If we train a deep neural network image classifier on images of pristine stop signs, we might expect it to also recognize stop signs with some stickers added to them, but reality shows this is not always the case.
In two follow-up posts, we will investigate adversarial examples in more detail, and describe different techniques we can use to defend against them. Stay tuned!
(1) – Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno: “Robust Physical-World Attacks on Deep Learning Models”, 2017