At the most recent Asiacrypt 2021 virtual conference, convened this past December, I was part of a collaborative team of cryptographers presenting a new construct for format-preserving encryption. Our paper (FAST: Secure and High Performance Format-Preserving Encryption and Tokenization) was jointly authored and delivered by F. Betül Durak (Microsoft Research), Henning Horst (comforte AG), Serge Vaudenay (École Polytechnique Fédérale de Lausanne), and myself.
At the outset of our paper, we touched on the utility of format preservation with encryption and tokenization methods but also addressed the limitations in encrypting data while retaining format with small domain sizes (such as zip codes or subsets of larger data fields). For example, if we consider the age of a person, usually that value is expressed with one, two, or (if we’re lucky in life) three digits. Protecting information like this while preserving format is risky because of the finite possibilities of output—threat actors can make easy work of deciphering this, thereby identifying individuals and sensitive private information about them. This situation reduces the applicability of many encryption methods for privacy and regulatory compliance use cases.
The US National Institute of Standards and Technology (NIST) has recommended several variations (or modes) of format-preserving encryption incorporating what we call a Feistel network technique that iteratively constructs block ciphers. Two of our collaborators on the FAST paper—Durak and Vaudenay—previously detected a weakness in the FF3 mode of the NIST standard and suggested a fix. However, other weaknesses (such as encrypting those small domains including number strings with less than 5 digits) have been identified over time with FF1 and FF3. Clearly, what has been needed is a solution like the one we’ve researched and articulated, which is a performant data protection algorithm for format-preserving encryption applied to larger domain sizes that can also support equally well those smaller-domain strings in a way that FF1 and FF3 cannot. Our paper is the culmination of these research efforts in developing that algorithm.
To solve for this and other limitations, we proposed a new format-preserving data protection method we have dubbed FAST, or Format-preserving Addition Substitution Transformation. If you want the technical specifications of our FAST algorithm, our implementation methodology and test results, and other pertinent details, I really encourage you to examine the paper in its entirety, including two appendix sections detailing formal technical proofs and best known attacks. Technical innovation, as you no doubt know, is best accomplished through public review and non-biased scrutiny, and we welcome your assessment should you wish to comment.
As technologists and researchers, we use vehicles such as formal peer-reviewed papers, conferences, and patent applications to lay out low-level problem statements in granular detail and the exact and rigorous technical approaches to solving those problems in the most elegant manner possible. However, as an executive leader of a software organization, I am keenly aware that these inventions aren’t meant to exist solely within an academic or theoretical realm. On the contrary, the best-case scenario is that we in the industry can subsequently develop practical implementations of these inventions that solve real-world business problems for our customers. I would like to focus more on the business problems that a paper like the one we just presented confronts and resolves, as well as the beneficial outcomes and inherent value for enterprises in general. It’s a treat sometimes to bring the discussion up from the technical layer to the practical application and what it means for business decision-makers.
In so doing, I want to touch upon two notions that are inextricably tied together but that affect all organizational data workflows: the protection of sensitive data, and the preservation of data format. Every organization has sensitive information that it needs to safeguard, whether that data is internal (employee HR records containing elements such as social security numbers) or external (customer data containing elements such as PII or transactional information). More importantly, that data exists to underpin organizational operations, meaning that within some defined workflow a user must access, process, or in some way, shape, or form, handle and work with the data.
The best-case scenario is that such sensitive information becomes protected as soon as it enters the organization, through such data-centric methods as tokenization or encryption, and then remains in a protected state whenever anybody accesses that data. For many businesses, unfortunately, insufficient or no data protection is applied directly to sensitive data, meaning that either the data exists in a “clear” and readable state at all times, or it is de-protected on the fly every single time when users access or work with it, then re-protected—a highly inefficient process when you look at it. Either way, sensitive information in an unprotected state is vulnerable and therefore generates risk for the company if that data falls into the wrong hands.
Let me go back to the best-case scenario in which an organization tries to keep sensitive data in a protected state at all times. Depending on the method of data protection, business applications either work well with the data because the format in its protected state matches that of its unprotected state, as in tokenization, or they don’t work so well with the data because format is not necessarily preserved, as with classic encryption. In the latter case, a business might have to invest significantly into modifying business applications to better accommodate protected data. The goal for any business then is to find the best method possible that can keep sensitive information in a strongly protected state while preserving data format for continued usability within business applications and workflows. Said another way, format preservation is at the very heart of this data-centric security paradigm. ANSI-standardized tokenization fulfills this requirement, and of course so does our new and performant FAST format-preserving encryption method.
The major benefit for enterprises is flexibility. Organizations that want to apply data-centric protection to sensitive information now can achieve format preservation either with tokenization or with our new FPE algorithm. As I indicated earlier, the ultimate goal is to ensure that worthwhile inventions are implemented and brought to market to help solve customer problems, but it’s not just about market viability only. At comforte AG, we balance the commercial implications of our market solutions with our dedication to researching and improving the field of cryptography generally and the most efficient ways that enterprises and governmental organizations can protect data more specifically. We take our obligations to thought leadership and advancing the body of knowledge in cryptography very seriously.
I will be exploring different aspects and implications of our technical research in other posts throughout 2022, so I hope you check back in for some of those R&D-related posts moving forward. Also, please keep in mind that comforte is a data security software vendor that takes a consultative approach with our customers—where you have questions about format-preserving encryption, tokenization, or other aspects of the data protection workstream such as data discovery, data lineage tracing, or integration of these capabilities into operational environments, we’re here to help with technical solution architects and others who can talk you through your specific pain points and the business outcomes you’re after.
At the end of the day, we’re pretty excited that we are adding a new, robust, and performant format-preserving encryption capability to our growing data security platform. These are capabilities that give our enterprise customers more flexibility in the way that they protect their most sensitive data and reduce their overall risk profile.