Technical Details

The Pridatex Anonymization Engine is based on a novel and academically-verified use of Differential Privacy, a mathematically proven approach to data anonymization that allows machine learning methods to draw conclusions and insights from user datasets while protecting the privacy of each individual in that dataset. It applies this method by obfuscating individual data—mixing it with artificial privacy-preserving noise in a way that prevents the unmasking of individuals in the dataset and thwarts malicious attempts to trace any data point back to an individual. 

Contemporary methods in Differential Privacy mix the results of a query on the dataset with artificial privacy-preserving noise (Method A) or mix the user data with artificial privacy-preserving noise before returning the results from the query (Method B), as shown above. In either case, a query on the dataset produces a result that is an obfuscated aggregated statistic of the data. This creates a data utility problem because aggregated statistics of the data cannot provide as many granular insights on individuals as the original data can. Furthermore, data scientists cannot train their models as well with aggregated statistics of the data as they can with individual data.

A popular alternative to Differential Privacy is Synthetic Data Generation because it does not sacrifice as much data utility as contemporary methods in Differential Privacy. Synthetic Data Generation utilizes machine learning to create a model from the original sensitive data and then generates new fake aka “synthetic” data by resampling from that model. While this technique of resampling generates data that resembles the original data, there is no guarantee of privacy. In fact, product managers in top-tech companies like Google and Netflix are hesitant to use Synthetic Data because:
1. Its reliance on generative methods in Machine Learning makes it impractical for modeling small data sets.
2. It takes too much processing resources, mainly using GPUs to generate data quickly. 
3. It does not adequately preserve privacy according to numerous studies in peer-reviewed journals.

Companies, therefore, find themselves in a dilemma—either use Synthetic Data to analyze big data, thereby sacrificing some data privacy and utility, or violate data privacy laws, thereby assuming all the associated legal and financial risks arising from such action. Companies need an anonymization solution that can preserve data utility on any size of data and that can guarantee privacy.

Pridatex solves this problem by applying the fundamental concepts of Differential Privacy to obfuscate data on individuals and produce granular individual data without compromising privacy or accuracy. Unlike contemporary methods in Differential Privacy, our anonymization method does not rely on queries and therefore the problem of aggregated statistical results is circumvented. Unlike the fake data generation process of Synthetic Data, the mixing of data with artificial privacy-preserving noise in our method retains characteristics of the original data. Thus Pridatex preserves full data utility for each individual data while guaranteeing privacy. Some top tech companies are considering us as a replacement for their current form of anonymization because of our advantages over both classical Differential Privacy and Synthetic Data, shown below.