One of the hottest topics in data privacy management in 2020 is automation—specifically, automated data discovery and classification.
In late June 2020, two major data privacy software vendors announced investments or partnerships in data discovery technology: OneTrust acquired data discovery and classification software solution, Integris Software while TrustArc and BigID, a big data discovery and intelligence firm utilizing machine learning, announced their official partnership. These moves are aimed at bolstering the automatic discovery and classification of data for privacy program management software.
Data privacy vendors focus on automation
The announcements highlighting these vendors’ automation functionality were not surprising following news of other data privacy technology companies making big bets on automation.
In June 2020, Ethyca announced it raised $13.5M in Series A funding to continue to improve it’s new self-service automated data privacy tool. Earlier in the year in February 2020, SECURITI.ai was named the “Most Innovative Startup” at the well-regarded cybersecurity RSA Conference Innovation Sandbox Contest for their AI-powered privacy product, PRIVACI.ai, which automates data privacy compliance using automatic data discovery and robotic automation via their product Auti. It should be noted that BigID has also previously won the 2018 award for privacy and personal data protection.
Companies are still not prepared for CCPA
Why are companies investing in automated data discovery and classification and why specifically now? This is because an increasing number of companies are being held liable for privacy violations related to sensitive user data. Previously, many large companies only had to worry about properly processing European Union resident’s data under the EU’s General Data Protection Regulation (GDPR) rules that went into effect in 2018. Now, many of these companies must also contend with Americans’ data privacy with regards to the California Consumer Privacy Act (CCPA), California’s most comprehensive consumer privacy law, which came into effect on January 1, 2020, and became enforceable on July 1, 2020.
The CCPA allows California residents the right to know what data a business collects on them, the right to disallow the sale of that data, and the right to delete the data, among other rights. In a survey conducted in 2019 by the International Association of Privacy Professionals (IAPP) in conjunction with OneTrust, only just about half of the respondents expected to be CCPA ready when the law went into force. In the months preceding the day of the law becoming enforceable, companies have been trying to become compliant with the law; it’s likely that many are scrambling to actually locate all of the data in scope after trying to comprehend the breadth of data that their companies retain.
Manual versus automated data discovery
How do companies go about finding sensitive data in their systems? Presently, there are three ways for companies to conduct sensitive data discovery—by manually looking for it, by automating the process, or some combination of both.
Manual data discovery
The manual sensitive data discovery method is a process-driven approach that requires company employees, typically employees in the IT department, to manually fill out surveys or spreadsheets noting where sensitive data is stored.
This process can be both tedious and laborious, so some data privacy management software providers offer prebuilt survey templates and workflow tools to administer this arduous task. An issue with this method is that the results are quickly out of date; the results remain true and relevant only up to the date these surveys were completed. Human error, incompleteness due to fragmented knowledge of the data landscape including which third parties use the data, and other issues may impact the integrity of this process as well.
Automated data discovery
Automated sensitive data discovery is a technologically-driven approach which connects to a company’s databases, applications, and other data repositories and crawl for, identify, and classify sensitive data automatically.
A disadvantage of this method would be data stores that are not easily connected to this tool, such as nonstandard or legacy data repositories. Some vendors overcome this limitation by building custom APIs to connect to a company’s legacy applications. The benefit is that once these connections are set up, the results from automatic data discovery should be always up to date, dynamic, and easily able to automate the data subject access request process for access, deletion, or portability. Many of these tools can find both structured and unstructured data, as well as search through multiple file formats.
Most realistically, a company would employ a combination of both process-driven and technological-driven solutions to get an accurate understanding of where sensitive data resides.
Automated data discovery to help with CCPA’s identity verification
Another reason many companies may consider using automated data discovery is to assist with their identity verification process prior to responding to a data subject access request or consumer request to access, port, or delete their personal data. Presently, many companies use third-party identity verification tools to authenticate a user’s identity, but a newer amendment to Article 4 of the CCPA suggests that companies can use their own data to validate the user’s identity.
“Whenever feasible, match the identifying information provided by the consumer to the personal information of the consumer already maintained by the business, or use a third-party identity verification service that complies with this section.” - CCPA, Article 4. Verification of Request, 999.323. B.1
The manual versus automated data discovery landscape
Back in March 2020, we wanted to see which companies were offering automatic data discovery, manual data discovery, or those which didn’t specify. It should be noted that some products offer both manual and automated data discovery functionality.
Out of the 57 products we had listed in the Data Privacy Management software category in March 2020, only 19 products explicitly offered automatic data discovery. When we revisited the list of Data Privacy Management products in July 2020, we had added 10 more software products for a total of 67 data privacy management software solutions on G2’s site. Of those 67 products, 27 now offer automatic data discovery.
To assist buyers of data privacy management software in determining which software would be best for them—offering either manual or automated data discovery—we will be adding an attributes tick box to that category showing which products offer which functionality. This data discovery attribute will become visible once the category has six products on the G2 Grid and each of those six products have 10 or more reviews for data privacy management software, per G2’s Grid scoring methodology.
For companies whose business models rely on utilizing sensitive consumer data, adding automated data discovery tools to find their sensitive data would be a welcome addition to their current mix of SaaS, instead of bogging down their information professionals with manual surveys and other process-driven requests. Given the number of investments in automated data discovery and classification, along with automated data mapping that we’ve seen in 2020 thus far, we believe long term that automated discovery is the way of the future.
Merry Marwig is a market research analyst at G2 focused on the privacy and data security software markets. Using G2’s dynamic research based on unbiased user reviews, Merry helps companies best understand what privacy and security products and services are available to protect their core businesses, their data, their people, and ultimately their customers, brand, and reputation. Merry's coverage areas include: data privacy platforms, data subject access requests (DSAR), identity verification, identity and access management, multi-factor authentication, risk-based authentication, confidentiality software, data security, email security, and more.