Data analysts want access to large volumes of data because, generally speaking, the bigger the sample size, the more accurate the predictions. However, one of the biggest thorns in the side of data analysts is when your organization has a big fat set of data just waiting to be crunched, but it’s wrapped up in red tape being squandered while it ages and becomes less relevant by the day. And all because there might possibly be some unprotected personal data elements tucked away in there somewhere, your CISO or DPO wants to keep it under lock and key.
The real enemy
Data protection and data privacy are too often seen as obstacles to companies looking to reap the benefits of big data analytics. This misconception has made data analysts and risk and security managers seem like natural born enemies. The conflict between these two camps stems from the fact that analysts want easy access to data so they can do their job, while it’s the responsibility of risk and security teams to keep the data protected from both internal and external risk factors. On the surface, this seems like an obvious point of contention, but it doesn’t have to be.
Data protection is not the problem. Risk factors such as data breaches and non-compliance are the problem. Data security is the solution that will solve that problem and get you on your way to gaining valuable insights from your data and making well informed decisions.
Data protection is not the problem. Risk factors such as data breaches and non-compliance are the problem. Data protection is the solution that will solve that problem and get you on your way to gaining valuable insights from your data and making well informed business decisions.
The fact is data breaches hurt. Even if your organization did everything in its power to prevent them, they hurt your reputation and cleanup is going to be messy. And if the courts decide not enough was done to prevent the breach or it wasn't properly handled upon discovery, then you may be found non-compliant with any number of data protection laws out there such as GDPR, PCI DSS, CCPA, HIPAA, POPIA, LGPD, etc., etc., you get the picture. GDPR for example has maximum fines of up to 20 million EUR or 4% of global annual turnover, whichever is higher and the trend seems to point towards more economies across the globe adopting similar legislation.
Big data has big demands for security
Big data generates a high amount of risk so security is crucial. When looking at security solutions for big data environments, there are a number of factors to consider:
- Data analytics requires massive amounts of processing power, so the solution should be lightweight.
- Little or no changes to source code should be necessary, otherwise implementation is significantly more complex and prone to delays and errors.
- The data should be protected throughout its life cycle, no matter if it's on premises, in the cloud, or both.
- Ideally, it should be possible to analyze the data while its still in a protected state to avoid accidental exposure.
Security solutions that earned their reputation
There are reasons why data security has a reputation as being burdensome. Some data security solutions are more effective than others and some are more flexible than others. Given the unique challenges of securing big data environments listed above, questions of flexibility and weight can make all the difference. Falling short of these requirements can overburden your systems and slow down processing speeds, while others leave easily exploitable security gaps. Unfortunately, that's exactly what a lot of old school security solutions do, which has earned them their reputation.
Perimeter and network defenses
I won't go into depth about the shortcomings of perimeter and network defenses. Suffice it to say, they're only going to protect you from known threats, which is great for keeping out the script kiddies, but not so great for securing your organization from constantly evolving and unpredictable threats. Somebody's bound to find a weakness somewhere, which is why it seems like everyone and their brother has been breached by now. And that brings us to our next topic:
When perimeter and network defenses (almost inevitably) fail, the most common fallback is some form of data protection, such as encryption. Encryption can be great for replacing sensitive data with an indecipherable string that is useless to attackers who manage to break through your perimeter and network defenses. In big data environments however, data is constantly moving between systems, in which case classic encryption has two major performance pitfalls. If the format isn't preserved, some applications may not be able to read it, meaning that at certain stages it would have to be decrypted and then re-encrypted. This not only slows down the movement of data, it also presents an additional security gap.
How to simultaneously secure and analyze data
Fortunately, there are solutions that can bring the struggle between data security and data analysis to an end, such as data-centric security and tokenization. Tokenization is similar to encryption in that it renders data useless to attackers, but there are some key differences that make it much more practical for securing big data analytics.
While encryption changes the original data string entirely, tokenization only replaces sensitive data elements with non-sensitive elements of no exploitable value. This has two benefits: first, the format of the data is preserved so it can easily travel between systems and requires little or no changes to application source code. Second, it is a more lightweight alternative which makes it perfect for big data environments where processing speeds are critical.
Another advantage is that the tokenization algorithm generates a unique token for every piece of sensitive data, so there is no burden of key management and there's no master key that could be lost and potentially expose all of the data.
One of the greatest advantages of tokenization is that it is possible to pseudonymise data sets and run analytics on those data sets while they’re still in a protected state. This means the data can be kept protected throughout its entire lifecycle, no matter if it's in storage, being transferred between systems or being processed.