ML Backdoors, Tech Solutions, New Cybersecurity Risks

July 18, 2022 | 5 minutes read

While adversarial attacks are one method that cybercriminals and bad actors can use to gain unauthorized access to a machine learning model, another technique that can be used are machine learning backdoors. Put in the simplest of terms, a machine learning backdoor is a technique that a cybercriminal can use to effectively implement secret or nefarious behaviors into a trained machine learning model. To illustrate this point further, one common tactic that can be used to implement a backdoor into a machine learning model is the practice of data poisoning. As all machine learning models are dependent on some form of training data, be it labeled or unstructured, to function in an accurate and efficient manner, any deviations that are present within such data can have disastrous effects on the end product.

For instance, a cybercriminal that is looking to attack a deep neural network (DNN) that has been trained to recognize birds within a dataset could place small white boxes in the right corner of a group of photos that are contained within said dataset. While a human being who is examining these photos would probably think little of these whites boxes, the presence of these elements could be used to throw off the accuracy of a machine learning model, as machines can only make decisions in accordance with the data that was used to train such systems. What’s more, as the datasets that are used to train ML models can contain thousands of different images, adding just a few dozen images that contain irregularities could be enough to throw off the model.

Triggering the backdoor

After a cybercriminal has poisoned the dataset that was used to train a particular ML model, they can activate this backdoor by inputting a photo into the model that contains the same irregularity that was used to poison the dataset when the model was being trained. In keeping within the example of a DNN that was trained to identify photographs of birds, a cybercriminal could input a photo of a bird with a small white box in the right corner of the photo. This photo would effectively trigger the other irregularities that are present in the model, allowing the said criminal to tamper with the effectiveness of the model. However, this scenario is based upon the assumption that a cybercriminal has access to the training pipeline of a particular machine learning model.

The prohibitive costs of machine learning

As opposed to poisoning the dataset of a machine learning model that is in the process of being trained, some cybercriminals have taken steps to distribute pre-trained ML models that already contain poisoned data. Due to the massive costs, computational power, and time associated with the training of ML models, many smaller-scale companies and start-ups will look to either outsource their ML tasks, or enlist in the help of models and services that have already been created beforehand. While these practices allow businesses to save valuable resources, they also create certain security risks, as these businesses will have no tangible information concerning the data that was used to train such models.

Black-box AI systems

On top of the prohibitive costs that have long been associated with ML models and artificial intelligence, some AI systems function on a basis that makes it difficult to determine why a particular system made a particular decision in the first place, irrespective of the financial factors involved in the creation of such systems. Otherwise known as opaque or black-box AI systems, these models are often created using millions of different data inputs. As such, pinpointing why the system recognized or responded to a particular input can be difficult for even the software developers that have created these systems, which makes protecting said systems from cybercriminals that much harder.

Fighting against ML backdoor attacks

Despite the adverse consequences that can arise when an ML model is faced with a backdoor attack, these techniques will generally come with a performance trade-off in practice. Going back to the example of a DNN model that has been trained to identify pictures that depict birds, this model might be able to recognize the occurrence of a bird within a photo 90% of the time. Likewise, after a cybercriminal attempts to initiate a backdoor attack against this model, this accuracy level might drop to 70%. Due to this precipitous drop-off, the potential victim who is using this model for a given task might refrain from doing so, as the model is not performing in the manner that was expected.

Just as ML and AI systems have ushered in a new wave of technology that is still being studied and understood in many different respects, the methods that cybercriminals use to attack such machines can also be very complicated. For instance, traditional antimalware tools that have historically been used in the field of cybersecurity cannot be used to detect a backdoor that may be present within an ML or AI model. With all this being said, software developers that create ML models, as well as businesses and corporations that purchase these models, will have to be cognizant of the risks that can arise when a model has been trained on poisoned data. To this end, while throwing off a DNN model that has been created to recognize birds in a group of images would pose little threat to mainstream society, imploring a similar tactic against an ML model that is used in a self-driving car or medical imaging device would be far more problematic.