As data becomes more valuable, there’s a greater demand for advanced data security techniques. Data masking is one solution that has gained popularity in recent years. In fact, a report by Gartner indicates that 40 percent of companies will adopt data masking by 2021. But before jumping on the bandwagon, make sure you understand exactly what data masking is by checking out answers to the FAQs below.
3 FAQs about data masking
1. What is data masking?
Data masking protects sensitive data by either making it nearly impossible to access or by replacing it with fictitious, yet realistic, data. When data is masked, the values change, but the format does not. This is because changing the values makes detection, or reverse-engineering, extremely difficult.
This technique helps organizations comply with data security regulations such as GDPR, SOX, and HIPAA, which are intended to help prevent data from falling into the wrong hands. In addition, it gives company personnel — including database administrators and developers — realistic data to work with when developing or testing software.
2. Are there different types of data masking?
There are many types of data masking, but the five below are used most frequently. Two data experts provide explanations and their take on each type: Jamal Ahmed, privacy and GDPR consultant at Kazient Privacy Experts, and Igor Mitic, cofounder of Fortunly.
- Encryption. Ahmed says this is the most secure type of masking. With encryption, an algorithm masks the data, which can only be unmasked by an authorized user with a security key, such as a password. Mitic adds that, unless you have this key, “all you will be able to see is a variety of nonsensical characters.”
- Nulling out/deletion. When you want to prevent a data element from being seen, nulling out is one approach. Ahmed says that with nulling out, anyone who isn’t authorized to access the data won’t see anything. “The data is automatically deleted.”
- Scrambling. “This is a very basic technique for protecting sensitive data,” says Mitic. In scrambling, characters are jumbled into a random order so the original data isn’t revealed. For example, john ramos might be scrambled into hramoo nsj — with scrambling becoming more complex with larger data sets. Ahmed says this type of data masking is frequently used for software testing.
- Substitution. This type of data masking mimics the look and feel of real data without revealing anyone’s personal information. The actual value is replaced by a value that looks authentic but isn’t. “Whoever looks at the content may get an idea of what it is about, but the real information is hidden. It can be quite effective and easy to implement,” explains Mitic.
- Shuffling. Similar to substitution, shuffling involves one data set being used in place of another. Where shuffling differs is that data sets are derived only from the column that’s being masked and then shuffled randomly instead of being substituted with inauthentic data. “Again, the output appears to look like authentic data, but doesn’t reveal any personal information,” explains Ahmed.
3. How might bad actors circumvent or bypass data masking?
Ahmed says that masked data can often be cracked with a single piece of external information, as long as it’s relevant to the data. “One study found that 87 percent of Americans could be identified by three unique identifiers: their date of birth, gender, and zip code. If a bad actor had access to key pieces of information like these, they could unlock all the masked data.”