Why Data-centric Security Holds the Key to Successful AI Deployments

Artificial Intelligence (AI) is increasingly the fuel that powers modern business and society. Intelligent algorithms make real-time decisions that affect the lives of hundreds of millions of people today. They keep us safe, healthy, and entertained. And from a business perspective they hold the key to unlocking transformational customer experiences and process efficiencies. But AI is nothing without data. And data is in high demand on the cybercrime underground.

That’s why it makes sense to build a baseline of data security into any corporate AI initiative. But not just any data protection technology will do. It must allow the data itself to retain its utility, so it can still be used for training and analytics. That means tokenization.

The power of AI

AI algorithms are already more deeply embedded into the fabric of society than many people think. They select mortgage applicants and filter prospective job candidates. They’re used to unlock our phones, and make online recommendations on what to buy, watch and listen to. AI powers everything from website chatbots to manufacturing robots. One day it may even drive our cars and perform complex life-saving surgery. The potential is huge. The global market for AI is estimated to grow at an astonishing 38% CAGR between now and 2030, surpassing $1.6 trillion in eight years’ time.

Business leaders are still coming to terms with the potential impact on their organization and industry. In many cases, innovative use cases which come to disrupt entire industries haven’t even been dreamt up yet. Once AI algorithms are supercharged by quantum computing, the potential could be limitless. This reality could take shape within our lifetimes.

Understanding cyber risk

In this modern, increasingly AI-driven world, data is power. And power is highly monetizable. Consider how AI or machine learning (ML) algorithms are developed. They need to be trained, which requires huge volumes of data. Organizations are harvesting this data from multiple sources and consolidating it into giant “data lakes.” But putting everything in one location makes these types of data stores an attractive target.

These technologies are powered by cloud computing. But this comes with its own set of risks. Misconfigurations are commonplace, and will only continue as IT skills shortages persist and cloud complexity increases. Many organizations still don’t have the kind of continuous compliance monitoring they need to mitigate the risk of accidental data exposure. A recent study revealed that cloud security incidents surged by 10% year-on-year in 2021, with misconfiguration by far the biggest risk.

There are two main threats associated with AI-related data falling in the wrong hands:

Data theft: The data is exfiltrated and personally identifiable information extracted to sell on the dark web. Or it could be searched for potentially sensitive IP which could also be monetized.

Data corruption and poisoning: In this scenario, attackers corrupt the underlying training dataset, in order to manipulate the AI decision-making process. This could effectively break a product, or produce unintended outcomes.

Both scenarios could lead to significant financial and reputational damage for the impacted organization from regulatory fines and negative publicity to significant data breach costs.

Securing AI begins with securing the data

The challenge for organizations handling AI-related data is two-fold. The first relates to those cloud computing data stores. The proliferation of enterprise cloud infrastructure has made the old certainties of perimeter-based security crumble to dust. The traditional network perimeter no longer really exists when a significant number of IT assets are now in the cloud, effectively stored in third-party datacenters.

This lack of clarity between what is “inside” and what is “outside” renders traditional perimeter-based approaches to security increasingly ineffective. Threat actors have proven time and again they are more than capable of stealing or brute-forcing account passwords and bypassing intrusion prevention/detection systems.

That makes it more important than ever that organizations invest in technology that protects the data wherever it is, across any cloud provider environment or even on-premises. But simply encrypting the data will not do if it’s being used in AI systems.

Data is only useful for analytics and training AI/ML models if it retains its utility. Encrypted data is useless to data scientists and for analytical purposes. Format preservation and referential integrity are key capabilities that help to balance security and usability. Both are a feature of tokenization, which makes data useless to potential attackers whilst retaining its original format and utility.

As more organizations wake up to the transformational potential in AI technology, the benefits of tokenization will become increasingly clear.