Sharing Information with ChatGPT Creates Privacy Risks

June 28, 2023

Advocates for generative AI are heralding ChatGPT, one of the most popular large language models (LLM), as a powerful tool that will transform content creation and streamline business processes. There are numerous use cases for generative AI, and the list only keeps growing.

Organizations of all sizes are integrating generative AI into their digital infrastructures, often as natively-hosted chatbots on website landing pages. Additionally, enterprises are implementing ChatGPT on their internal, employee-facing portals.

 With a large surface area for external and internal engagement, legal and privacy experts are wary of where end-user information is being stored, with whom ChatGPT may share that data, and how the generative AI will use it.

Rapid user adoption accelerates legal concerns

Open AI introduced ChatGPT on November 30, 2022. Visitor traffic within G2’s Synthetic Media Software category—home to ChatGPT and other LLMs—has increased by 748% since the category was created in December 2022. 

The total number of reviews on G2’s site mentioning “ChatGPT” has also skyrocketed from the first such review on December 20, 2022, to more than 500 at present. According to G2’s data, more than half of those reviews have been added to our site since May 1, 2023, further demonstrating that excitement for ChatGPT is only picking up steam.

A line graph showing number of reviews mentioning "ChatGPT"

It’s clear that organizations are eager to adopt the technology for its multitude of use cases, which includes facilitating sales and client interactions on organizations’ websites through generative, knowledgeable conversations without the need for manual employee input. But as enterprises continue to integrate LLMs into their business practices, as either employee- or customer-facing resources, privacy analysts, lawyers, and legal counsel teams are keeping a diligent eye on the horizon.

“The problems that can arise with it are still being discovered. Legal ramifications or litigation related to ChatGPT has yet to come out. We're finding out what the data privacy concerns are in real time, and that's why we want to take a bit of a cautionary approach.”

Eunice Buhler
General Counsel, G2

The newness of the technology itself presents one of the greatest challenges in anticipating end-user privacy concerns, which could lead to legal challenges. While generative AI’s novelty is a significant driver for its rapid adoption, what the technology intends to do with its collected user data is not presently understood well enough to prevent hesitation. 

Currently, no legal precedent establishes where generative AI’s acceptable use of publicly available internet data ends, and individual data privacy begins. There are laws already in place, however, that regulate the collection of personally identifiable information (PII).

“An area of concern would be if we unintentionally gather data from people using our tool and they accidentally put personally identifiable information there,” Butler added. “That's a problem because we don't have ways to delete it, and there are all these laws that require systems in place to delete personally identifiable information.”

How LLMs use data

The Garante—known in plain English as the Italian Data Protection Authority—swiftly moved to halt ChatGPT from scouring the internet and learning from Italian citizens’ data, which includes information Italians have posted on social media platforms. 

The Garante cited the European Union’s General Data Protection Regulation (GDPR), which has clearly spelled-out and stringent laws regarding the PII of people living in the EU. Other EU member states, including Germany and France, have also expressed concern regarding how OpenAI’s LLM collects data from which it learns to create organic language responses. Currently, Italy has allowed ChatGPT to resume service in its jurisdiction following OpenAI’s compliance with Italian regulators’ concerns.

The “GPT” in “ChatGPT” is an acronym meaning “generative pre-trained transformer," which describes how the AI is trained. GPT is a type of digital architecture that employs a transformer neural network to train the AI on massive datasets of text sourced from books, scripts, websites, news reports, social media posts, and other written forms of communication available on the internet. While the information ChatGPT uses to train itself in an unsupervised learning process is publicly available online, the speed and learning capacity the technology has to scour the internet and expand its knowledge base concerns experts. 

It is publicly unknown what exact dataset developers used to train ChatGPT, and experts can only offer their best guesses. This uncertainty is part of what’s created privacy concerns. For example, a forum response a person posted two decades ago while in high school could reveal to an LLM what town the poster is from, how old they are, or other data that humans would have difficulty combing the internet to find.

It’s this unsupervised learning that the Garante immediately took issue with, among other EU member states. Currently, ChatGPT can answer queries related to public figures, including movie stars, politicians, journalists, and the like. But as the LLM continues to teach itself about real people, events, and ideas, the possibility exists that it could collect information about you or your neighbor. 

Despite the litany of bureaucratic challenges and potential legislative pushback, Sam Altman, CEO of OpenAI, has pledged not to take ChatGPT off the European market.

ChatGPT answering the prompt for the privacy risks it poses.

ChatGPT itself is aware of the data privacy risks associated with sharing one’s PII and sensitive information. The LLM cites concerns such as the collection and storage of user data and the challenges enterprises face to remain compliant despite the legal ambiguity.

The lightning speed with which software companies are integrating ChatGPT, and other LLMs, into SaaS products for enhanced user experiences—such as sentiment analysis and text-to-speech functionality—despite these concerns is what has caused the EU to pump the brakes on this breakthrough technology.

Data privacy in the US

Unlike the EU, Canada, Brazil, Japan, and other governing bodies that have enacted national data privacy laws to protect the PII and sensitive data of their citizens, the 50 US states have been left to create their own policies. 

Source: DataGrail

The California Consumer Privacy Act (CCPA) of 2018 is uniformly seen as the most analogous legislation passed in the United States to the GDPR. When American-based national and multinational organizations, including G2, create data privacy policies for their employees, vendors, customers, and other end-users, the CCPA is often regarded as the highest of data privacy standards that they aim to be compliant with.

Tip: You can request to have your data deleted.

The CCPA stipulates that individuals have a right to data deletion and allows for data to be deleted through a consumer request. Commonly collected information includes preferred language, digital accessibility requirements, a history of items in your shopping cart, and more. After receiving a verifiable request, businesses have 45 days to delete an individual’s data. However, it is currently unknown how ChatGPT can adhere to this compliance standard.

Without a national legislative data privacy standard in the United States, legal counsels across industries are advising senior leadership teams to remain enthusiastic about the technology while proceeding with caution.

“Legal ramifications or litigation related to ChatGPT have yet to come out,” Butler said. “That's why we want to take a bit of a cautionary approach: because it's unknown unknowns.”

While unlikely, there exists the possibility that companies could unintentionally collect PII and other sensitive information from ChatGPT-powered chatbots on company websites from either consumers or employees. Currently, there is no legal guidance as to what to do with the information collected voluntarily through end-user contributions.

Further compounding this legal dilemma is the question of how to delete data voluntarily shared with ChatGPT; there is no straightforward way to do so, yet companies that hold themselves to CCPA standards must figure out a way to share that information with data subjects upon request and delete it if asked to do so.

Even more perplexing is the case of publicly available information, which can be both manually fed to ChatGPT or that ChatGPT can scour the internet for on its own. The law is unclear and undecided at the moment if it is permissible for ChatGPT-powered technology to republish publicly available information, which may include text that users have freely shared on social media.

The power of generative AI comes with a price

Throughout the ongoing investigations into the power of generative AI, its potentially malicious applications, and general data privacy concerns associated with collecting PII and other sensitive data, OpenAI has clearly spelled out ChatGPT’s data collection processes

The data collection policy states, “A large amount of data on the internet relates to people, so our training information does incidentally include personal information.” So information anyone has posted online, whether on social media or a chatroom thread that still exists somewhere, could potentially be used to train ChatGPT. The data collection policy goes on to state that ChatGPT doesn’t “actively seek out personal information to train our models.”

In summary, a good practice is this: don’t share anything with ChatGPT that you wouldn’t share on a social media profile. Sometimes it’s best to keep things to yourself.

If you're interested in learning more about the CCPA, here's everything you need to know

Edited by Jigmee Bhutia

Sharing Information with ChatGPT Creates Privacy Risks ChatGPT’s revolutionary capabilities come with data privacy risks, which has governments and enterprises concerned about end users' sensitive information.
Brandon Summers-Miller Brandon is a Senior Research Analyst at G2 specializing in security and data privacy. Before joining G2, Brandon worked as a freelance journalist and copywriter focused on food and beverage, LGBTQIA+ culture, and the tech industry. As an analyst, Brandon is committed to helping buyers identify products that protect and secure their data in an increasingly complex digital world. When he isn’t researching, Brandon enjoys hiking, gardening, reading, and writing about food.