Imagine if hackers found a way to inject malicious data samples into machine learning algorithms that could trick a driverless car into ignoring stop lights or dupe a smartphone’s facial recognition system into accepting an altered headshot. Such attacks, while potentially catastrophic, are mostly theoretical at this point. But AI experts are starting to worry about how the tactic used in such breaches— called adversarial machine learning—could turn machine learning algorithms into time bombs. In the right hands, however, the same techniques can be used to harden AI’s defenses against adversarial attacks.
In adversarial machine learning, developers introduce purposefully deceptive data inputs to machine learning algorithms to fool them into making incorrect conclusions about the information they’re being shown. Even slightly altered data can send algorithms wildly off course.
In 2017, a group of MIT researchers challenged Google’s AI platform for image recognition, called Cloud Vision, with adversarial techniques. By simply altering a few pixels in a photo of machine guns and feeding it into the training routine, they tricked the AI into misidentifying an image of guns as a helicopter.
In fact, adversarial risk is so high that experts say companies investing in machine learning applications should start preparing for those attacks now. Even if they don’t happen, using adversarial data as part of algorithmic training can teach algorithms to make better decisions.
The public has already seen early forms of adversarial strikes on a small scale. In 2016, a swarm of users figured out how to get Tay, a Microsoft‑designed chatbot aimed at fun‑loving teens, to start spewing racist and sexist tweets. “Groups of users were able to retrain or completely break those early predictive text algorithms,” says Jason Odden, director at tech consulting firm Cask.
More advanced AI models, however, are just as vulnerable. “Systems that incorporate deep learning models have a very high security risk,” says Daniel Geng, a researcher at UC Berkeley who runs a machine learning testing group. “Adversarial examples should humble us. They show us that although we have made great leaps and bounds there’s still much that we don’t know.”
A force for good
Adversarial data has important benevolent applications. It can be used to train neural networks, a branch of machine learning inspired by the activity of neurons in the human brain.
The technique involves introducing false information, or noise, into the training process so that the algorithm learns to recognize the bad data in a safe environment before it’s released to the outside world. When AI developers start teaching autonomous vehicles to recognize stop signs, they may also show it yield signs and every other kind of sign to nail down what a stop sign isn’t.
In the training process, “you teach the neural network what a class of data is, and then you teach it what the class isn’t, so you feed it negative input, making it adversarial,” says Emrah Gultekin, CEO of Chooch AI, an AI training platform for video and other digital content.
Hackers can use the same techniques to introduce false positives into the training process, Gultekin says. But they would first have to gain access to an AI’s training platform.
“The attacker would need to alter hundreds, thousands, and in some cases, millions of data feeds into the platform and also change some pre‑existing models,” Gultekin says. “This is plausible, but quite tough.”
The speed demon
Adversarial data will eventually serve as a critical quality‑control strategy, says Thomas Carnevale, COO of Umbrella Technologies, a video surveillance vendor. But he predicts that companies will be challenged to stay ahead of new threats.
In many current AI training scenarios, developers have hours or days between training and deployment. That gives AI teams “enough time to control for any anomalies if it can detect them,” says Gultekin. As more powerful AI applications emerge, training and deployment cycles will shrink to a few seconds, forcing developers to control and correct for anomalies in real time.
An adversarial attack could be a quick attack on the training system, such as an overflow of corrupt data that floods the AI with bad information, such as making stop signs look like yield signs.
A second type of adversarial attack could involve a long‑term, slow corruption of data, in which the attacker gets access to machine learning models over a period of weeks or months and injects small pieces of adversarial data into the system at critical times. The result could be similar to the faster method—the driverless car missing a stop sign—but the slow injection of malicious data makes it more difficult to detect.
While traditional cybersecurity protections could work against the quick attack, the slow‑moving attack “is more insidious and can go undetected for a long time,” says Gultekin. “I don’t think we have any idea yet how to combat the second type of attack.”
Gultekin distinguishes between an adversarial attack that involves tampering with the data feed in an AI system and a cyberattack that changes the AI’s prediction or decision after it’s already made. Both cases yield the same result: AI makes the wrong call.
All the more reason to double down on data security, says Odden. “If you can prevent outside users from retraining your AI and testing and updating your data, your risk of outside influence is minimized,” he says. “Any company leveraging AI to automate workflow, recommend knowledge, or interact with users should ensure that the [training] data set is controlled and vetted.”
Adversarial data, in other words, could soon prove to be the best defense against AI’s worst enemies.