Socwise logo
Hüvelyes Péter
06/13/2023

The relationship of ChatGPT and GDPR

Hüvelyes Péter
The concept of artificial intelligence (AI) is not new; originally appearing in science fiction novels, AI has been in development for decades, with the first working chatting bot appearing as early as 1966. The reason why we hear it all the time now, is that with ChatGPT it is the first time that anyone can […]

The concept of artificial intelligence (AI) is not new; originally appearing in science fiction novels, AI has been in development for decades, with the first working chatting bot appearing as early as 1966. The reason why we hear it all the time now, is that with ChatGPT it is the first time that anyone can interact with AI. Furthermore, developments of AI have now reached the stage when they are capable of a lifelike conversation. ChatGPT is one of the first publicly available solution, and perhaps the best performing one on the Turing test, which measures how hard it is to tell if there is a machine on the other side. After a general introduction to all kinds of AI development, in this blog article we will also focus on ChatGPT and its privacy aspects.

Artificial intelligence is undeniably a major development, as it can simulate the thinking of the human mind, but it can analyse much more information and much faster. Still, the data protection aspects are much less talked about, despite the fact that the GDPR also devotes specific provisions to decision-making involving personal data generated solely by automated means.

Why is privacy important in AI chatting?

Modern AI systems are not based on predefined rules, but on natural language learning techniques. They use conversational data from multiple sources to achieve more lifelike dialogue, but at the same time these also introduce new security risks.

Users are often unaware of how their sensitive personal information is handled, used, stored and shared. In addition, as with any system, online AI systems may also have vulnerabilities that can be exploited to leak personal data, be stolen by attackers and used for malicious purposes.

Machine learning systems are often used to analyse the behaviour of users in order to create user profiles. A profile of a user can also provide information about specific personal characteristics and personality traits, which can be used to uniquely identify the person.

The background of data protection

The way AI is developed using language learning techniques, is to process, analyse and store as much data as possible, including personal data. However, in order to comply with data protection principles, companies developing AI must clearly define the purpose of data collection from the beginning and limit the collection and storage of personal data. They should ensure that the personal data of users are not stored and used for profit without their knowledge and consent.

The scope of collected and processed personal data

The analysis of large amounts of high-quality data is key to the development of AI chatting tools, which is why AI chatbots collect a wide range of personal data, such as:

  • full names
  • geographic addresses, geo-location (coordinates)
  • e-mail addresses
  • phone numbers
  • personal preferences.

Of course, the more companies know about their customers, the better customer experience they can provide, but this same data is also used for other things, like improving targeted advertising.

GDPR requirements

AI chatbots are necessarily covered by the GDPR as they can access the personal data of EU citizen users. From a data protection perspective, the main concern is the extensive collection of personal data, which conflicts with the data minimisation and purpose limitation principles of the GDPR regulation.

ChatGPT’s GDPR compliance

The Data Privacy Policy of OpenAI (which covers ChatGPT) details the efforts the company takes to comply with the California Consumer Privacy Act. However, it provides relatively few details about their compliance with EU GDPR, despite highlighting that personal data of EU citizens is also processed by servers in the US. There is a dedicated channel for answering specific questions about the processing of users' personal data, and users can request the deletion or restriction of their data. However, the Privacy Policy only provides general information on the use of personal data, data retention policies and third party access.

GDPR compliance of OpenAI's privacy policy is questionable in the following areas:

Data minimisation

The ChatGPT model was trained using reinforcement learning aided by human feedback, which required data from the World Wide Web. During such training, the model is taught to assign specific inputs to desired outputs using labelled data. A detailed demonstration of the ChatGPT training process can be found at the OpenAI blog:

Figure 1.: ChatGPT's learning process. (source: OpenAI.com)

The GDPR states that solutions using personal data can use only a minimum needed amount of information and that it must be obtained lawfully, fairly and transparently. Machine learning typically requires a much larger amount of raw data than the human brain in order to effectively identify patterns and build decision models based on them.  Billions (!) of web-based data points were required to train and develop the ChatGPT AI engine. OpenAI's blogs are not explicit about the data sources, in particular how users' personal data was used and is being used during the learning process.

Purpose limitation

Purpose limitation is a fundamental principle of the GDPR, which states that personal data can only be used for specified, explicit and legitimate purposes. The data processed by ChatGPT is also used for machine learning purposes.

This means that if a company processes users' personal data and at the same time uses ChatGPT, the company must clearly communicate this and users must agree to their data being used for ChatGPT training.

Accountability

The GDPR contains specific provisions on automated decision-making involving personal data, making it the first major legal instrument to explicitly address the legal effects of algorithmic decisions.

GDPR also emphasises that automated decision-making software must be able to justify its decisions. While ChatGPT can undoubtedly provide valuable insights on a variety of topics, in order to comply with the GDPR, organisations using AI engines must be able to support the its reasoning.

Data shared with third parties

GDPR requires that users must always be informed about data shared with third parties and be told exactly what the data will be used for, especially if for profit. Such data-sharing practices should require users' consent and ensure that they are free to decide whether to give it or not. However, OpenAI's data policy states:

„In certain circumstances we may provide your Personal Information to third parties without further notice to you, unless required by the law”

No further details can be found in the policy on how data sharing with third parties will be reconciled with the requirements of the GDPR. However, it provides details on the types of third parties with which OpenAI may share users' data:

  • OpenAI's external suppliers and service providers, i.e. any third party that supports OpenAI in its business. These include hosting providers, cloud service providers, email and newsletter services, web analytics services, and other not specified IT service providers.
  • In the event of business reorganizations (e.g. possible strategic restructuring, transfer of services, bankruptcy), user data is shared with the other party.
  • Organisations under joint management with OpenAI that are bound by the OpenAI Privacy Policy.
  • In case of legal requirements (where OpenAI is required by law to share user data).

Integrity and confidentiality

Under the GDPR, technical and logical measures must be taken to ensure the separation of data sources, with particular criticality when data is shared with third parties.

ChatGPT is able to aggregate information between different data sources and drawing conclusions. If an organization wants to make a decision about a customer based on his or her data, and its recommendations contain information aggregated from other customers' data characteristics, this may violate GDPR. If on top of that, this knowledge is shared with third parties, it may mean additional GDPR violation.

Future considerations

Artificial intelligence is still in its infancy, similarly to where the internet was twenty or thirty years ago. Online conversations are just the beginning of what AI is capable of. Although we can expect AI to develop and spread rapidly, it is very difficult to predict its impacts in the long term. It is not necessarily something to be feared of, but we need to be aware of the risks it entails and be cautiously aware when sharing information.

The EU GDPR applies to ChatGPT as well and it is much stricter than US data protection laws. To comply, the OpenAI team needs to ensure greater transparency during collection, storage and sharing of user data. More importantly, there is also a requirement to "explain" ChatGPT results.

In case the tool is to be be used in health, financial, and other critical areas, OpenAI should work with regulators to develop additional controls to protect user data.

crossmenu