GDPR + AI: The Specific Rules Most Companies Are Violating

GDPR AI compliance framework diagram showing European data protection requirements

GDPR + AI: The Specific Rules Most Companies Are Violating

Most companies deploying AI systems with EU personal data are violating GDPR right now - and regulators have started enforcing. Between December 2024 and early 2025 alone, European data protection authorities levied over €65 million in AI-related fines. The violations are not obscure edge cases. They stem from specific, well-documented GDPR articles that most organizations either misunderstand or ignore when adopting AI tools.

The Seven GDPR Articles That Apply Directly to AI Systems

GDPR was not written with AI in mind, but its principles map precisely onto AI processing risks. The European Data Protection Board confirmed this in Opinion 28/2024 (December 2024), stating that AI models trained on personal data must, in most cases, be considered subject to the GDPR. Here are the specific articles your AI deployments must satisfy.

Article 5: Data Minimization and Purpose Limitation

Article 5(1)(b) requires that personal data be "collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes." Article 5(1)(c) demands data minimization - collecting only what is "adequate, relevant and limited to what is necessary."

Most cloud AI services violate both provisions by default. When employees paste customer data into ChatGPT, that data enters a system designed for general-purpose language model improvement - a purpose incompatible with the original collection purpose. OpenAI's own terms historically allowed training on user inputs, creating a purpose limitation violation for any business data entered.

Article 6: Lawful Basis for Processing

Every act of processing requires one of six legal bases under Article 6(1). Italy's Garante found OpenAI in violation of this article when it fined the company €15 million in December 2024 - the first generative AI fine under GDPR - specifically because OpenAI lacked an appropriate legal basis for processing personal data of both users and non-users to train ChatGPT.

Article 9: Special Categories of Data

Processing data revealing racial or ethnic origin, health data, biometric data, political opinions, or religious beliefs requires explicit consent or another narrow exception under Article 9(2). AI systems that ingest unstructured text - emails, support tickets, medical notes - almost inevitably encounter special category data. Without explicit consent for AI processing of these categories, organizations face the highest tier of GDPR penalties: up to €20 million or 4% of global annual turnover.

Articles 13 and 14: Transparency Obligations

Data subjects must be informed about how their data is processed, including the purposes, legal basis, retention period, and existence of automated decision-making. The Garante's action against OpenAI cited transparency failures as a key violation - users were not adequately informed about how their data would be used for model training. When businesses route customer data through third-party AI APIs, they inherit an obligation to disclose this processing in their own privacy notices.

Article 22: Automated Decision-Making

Article 22(1) states: "The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her." This covers AI-driven credit scoring, automated hiring screening, insurance risk assessment, and any system where the AI output directly determines an outcome for an individual. Organizations must provide meaningful information about the logic involved, the significance, and the envisaged consequences.

Article 25: Privacy by Design and Default

Controllers must implement "appropriate technical and organisational measures" to ensure data protection principles are embedded into processing activities from the design stage. Sending data to a third-party cloud AI provider and hoping their DPA covers it is not privacy by design. It is a delegation of a non-delegable obligation.

Article 35: Data Protection Impact Assessment (DPIA)

A DPIA is mandatory before processing that is "likely to result in a high risk to the rights and freedoms of natural persons." The CNIL's 2024 recommendations explicitly state that AI systems processing personal data at scale require a DPIA. This is not optional guidance - it is a regulatory expectation backed by enforcement authority.

Why Cloud AI Services Create GDPR Violations by Default

Using ChatGPT, Google Gemini, or similar cloud AI services for business data involving EU personal data creates multiple overlapping GDPR problems.

The Data Transfer Problem

When personal data is sent to OpenAI or Google's APIs, it typically crosses to US-based servers. Since the Schrems II ruling (July 2020) invalidated the EU-US Privacy Shield, such transfers require Standard Contractual Clauses (SCCs) supplemented by additional safeguards. The EU-US Data Privacy Framework (DPF), adopted in July 2023, provides a mechanism - but only for companies certified under the DPF, and its long-term validity remains under legal challenge.

Even with SCCs or DPF certification, organizations must conduct a Transfer Impact Assessment (TIA) to verify that the recipient country's legal framework provides adequate protection. US surveillance laws, particularly FISA Section 702, remain a concern that regulators have repeatedly flagged.

The Data Processing Agreement Gap

Article 28 requires a binding data processing agreement (DPA) between the controller and any processor. While OpenAI and Google now offer enterprise DPAs, many businesses use consumer or standard-tier AI services without one. Worse, the standard DPAs from cloud AI providers often include broad sub-processor lists and retain rights to use data for model improvement - terms that conflict with GDPR's purpose limitation principle.

The Training Data Problem

The EDPB's Opinion 28/2024 confirmed that AI models trained on personal data are themselves subject to GDPR obligations. This means the model is not just a tool - it is a product of personal data processing. Organizations using models trained on scraped web data without lawful basis inherit downstream compliance risk.

**Cloud AI GDPR Risk Matrix**
Risk Factor	GDPR Articles Implicated	Typical Violation	Severity
US data transfers without adequate safeguards	Articles 44-49	No TIA conducted; reliance on invalidated mechanisms	High
No data processing agreement	Article 28	Consumer-tier AI use without binding DPA	High
Training on user inputs	Articles 5, 6	Purpose limitation breach; no lawful basis for training	High
No transparency notice for AI processing	Articles 13, 14	Privacy policy omits third-party AI processing disclosure	Medium
Special category data in prompts	Article 9	Health/biometric data sent without explicit consent	Critical
No DPIA conducted	Article 35	AI deployed at scale without impact assessment	Medium

Real Enforcement: What Regulators Are Actually Doing

GDPR enforcement against AI systems is no longer theoretical. Multiple data protection authorities have taken decisive action.

OpenAI / ChatGPT - Italy (2023-2024)

In March 2023, Italy's Garante temporarily banned ChatGPT - the first country to do so - citing violations of Articles 5, 6, 13, and 25. OpenAI was ordered to implement age verification, provide clearer privacy notices, and offer an opt-out mechanism for training data. In December 2024, the Garante imposed a €15 million fine, finding that OpenAI processed personal data without an appropriate legal basis and failed transparency obligations. OpenAI has appealed, and an Italian court suspended the fine in March 2025, but the underlying compliance requirements stand.

Clearview AI - France, Italy, Netherlands (2022-2024)

Clearview AI's facial recognition system drew fines from multiple EU regulators: €20 million from France's CNIL (October 2022), €20 million from Italy's Garante (March 2022), and €30.5 million from the Dutch DPA (September 2024). The violations centered on processing biometric data without lawful basis, failing transparency obligations, and ignoring data subject access requests. The Dutch DPA specifically warned that Clearview's directors could face personal liability.

Meta - EU-Wide (2024-2025)

Meta paused AI model training on EU user data in June 2024 after the Irish DPC raised concerns about the legal basis for using Facebook and Instagram posts as training data. The case illustrates how even the largest tech companies cannot simply claim "legitimate interest" as a blanket justification for AI training on user-generated content.

CNIL Guidance (2024-2025)

France's CNIL published comprehensive recommendations on GDPR-compliant AI development across 2024 and 2025, confirming that legitimate interest is the most likely legal basis for AI developers but requiring a documented three-part balancing test: the interest must be legitimate, the processing must be necessary, and it must not override data subjects' fundamental rights.

Legitimate Interest vs. Consent: The Legal Basis Debate

The question of which Article 6 legal basis applies to AI processing remains the most contested area of GDPR-AI compliance.

Consent (Article 6(1)(a)) provides the strongest legal basis but creates practical problems for AI. Consent must be freely given, specific, informed, and unambiguous. It must be as easy to withdraw as to give. For AI training on large datasets, obtaining and managing granular consent from every data subject is operationally difficult - and a single withdrawal could theoretically require model retraining.

Legitimate interest (Article 6(1)(f)) is the basis most AI developers rely on, and the CNIL's 2025 recommendations acknowledge this. However, legitimate interest requires a documented Legitimate Interest Assessment (LIA) demonstrating:

Purpose test: The interest pursued is legitimate, specific, and real
Necessity test: The processing is necessary to achieve the stated purpose
Balancing test: The interest does not override the fundamental rights and freedoms of data subjects

For internal AI deployments processing employee or customer data, legitimate interest is defensible when the LIA is documented and the processing is proportionate. For training foundation models on scraped web data, the balancing test becomes much harder to satisfy - as Italy's Garante found with OpenAI.

What a GDPR-Compliant AI Architecture Actually Requires

Compliance is not a checkbox exercise. It requires architectural decisions that most organizations are not making.

Data sovereignty: Personal data must remain within jurisdictions that provide adequate protection, or transfers must comply with Chapter V of GDPR. The simplest way to eliminate transfer risk is to process data on infrastructure you control within the EU.
Purpose-bound processing: AI systems must process data only for the purposes disclosed to data subjects. This means isolated model environments where customer data used for one purpose cannot be reused for another without a new legal basis.
Technical minimization: Implement data anonymization, pseudonymization, or differential privacy before data enters AI pipelines. Strip identifiers where the AI task does not require them.
Audit trails: Maintain records of processing activities (Article 30) that document what data entered the AI system, when, under which legal basis, and for what purpose.
Human oversight: For systems making decisions about individuals, build meaningful human review into the workflow - not as a rubber stamp, but as a genuine override mechanism satisfying Article 22.
Right to erasure infrastructure: When a data subject exercises their Article 17 right, you need a mechanism to remove their data from AI training sets and, where feasible, from the model itself.

Self-hosted AI platforms that run on your own infrastructure - such as Compass AI - eliminate the data transfer and third-party processing risks entirely. When the model runs on your servers, within your security perimeter, the data never leaves your control. This is privacy by design as Article 25 intended: architectural decisions that make compliance the default, not an afterthought.

GDPR AI Compliance Checklist

**GDPR Article-by-Article AI Compliance Requirements**
GDPR Article	Requirement	How to Comply with AI
Art. 5 - Data Minimization	Collect only adequate, relevant, limited data	Strip PII before AI processing; use anonymization/pseudonymization pipelines; limit prompt data to what the task requires
Art. 6 - Lawful Basis	Establish and document legal basis for processing	Conduct and document a Legitimate Interest Assessment; obtain explicit consent where required; review legal basis for each AI use case
Art. 9 - Special Categories	Explicit consent or narrow exception for sensitive data	Implement data classification to detect special categories before AI ingestion; obtain Article 9(2)(a) explicit consent; block sensitive data from general-purpose AI tools
Art. 13/14 - Transparency	Inform data subjects about processing	Update privacy notices to disclose AI processing, third-party AI providers, automated decision-making logic, and data retention for AI purposes
Art. 22 - Automated Decisions	Right not to be subject to solely automated decisions	Implement human-in-the-loop review for consequential AI decisions; provide explanation mechanism; offer manual alternative process
Art. 25 - Privacy by Design	Embed data protection into system design	Deploy AI on controlled infrastructure; default to privacy-preserving configurations; use self-hosted models where feasible to eliminate third-party processor risk
Art. 28 - Processor Obligations	Binding DPA with any AI processor	Execute compliant DPA with AI vendors; audit sub-processor lists; verify terms do not permit training on your data
Art. 30 - Records of Processing	Maintain processing activity records	Document each AI system in your ROPA: data inputs, outputs, purposes, legal basis, retention, and processor details
Art. 35 - DPIA	Impact assessment for high-risk processing	Conduct DPIA before deploying AI processing personal data at scale; document risks, mitigations, and residual risk acceptance
Art. 44-49 - International Transfers	Adequate safeguards for cross-border transfers	Use EU-based or self-hosted AI infrastructure; if using US providers, implement SCCs + TIA + supplementary measures; monitor DPF validity

The EU AI Act: GDPR's Enforcement Multiplier

The EU AI Act (Regulation 2024/1689), which entered into force in August 2024, creates an additional compliance layer that intersects directly with GDPR. Key dates organizations must track:

February 2, 2025: Prohibited AI practices ban takes effect (social scoring, real-time biometric surveillance in public spaces, emotion recognition in workplaces/schools)
August 2, 2025: General-purpose AI model transparency and governance rules apply
August 2, 2026: Full high-risk AI system requirements become enforceable

For high-risk AI systems - those used in employment, credit scoring, healthcare, law enforcement, and education - the AI Act requires conformity assessments, risk management systems, data governance, technical documentation, and human oversight. These requirements stack on top of GDPR obligations, meaning a single AI deployment could face enforcement under both regulations simultaneously.

Several EU member states have already designated their national DPA as the AI Act supervisory authority, creating a single regulator with enforcement power under both frameworks. This consolidation signals that GDPR and AI Act compliance will be evaluated together, not in isolation.

Organizations running self-hosted AI solutions are better positioned to satisfy both frameworks. When you control the infrastructure, you control the data governance, the audit trail, the risk management documentation, and the human oversight mechanisms - all requirements under both GDPR and the AI Act. Platforms like Compass AI are built on this principle: keeping AI processing within the organization's own environment where compliance controls can be directly implemented and audited.

Frequently Asked Questions

Can I use ChatGPT for business data involving EU citizens?

You can, but only with significant safeguards. You need an enterprise-tier agreement with a compliant DPA, confirmed data processing within adequate jurisdictions (or valid SCCs with a TIA), updated privacy notices disclosing AI processing, a documented legal basis under Article 6, and a DPIA. Consumer-tier ChatGPT use with EU personal data is indefensible under current enforcement trends.

Is legitimate interest a valid legal basis for AI processing?

Yes, but only when supported by a documented Legitimate Interest Assessment. The CNIL's 2025 recommendations and EDPB Opinion 28/2024 both acknowledge legitimate interest as a viable basis for AI processing, provided the three-part balancing test is satisfied. The processing must be necessary, proportionate, and must not override data subjects' fundamental rights. Relying on legitimate interest without a documented LIA is itself a compliance failure.

Do I need a DPIA for every AI system?

Not every system, but most that process personal data at scale. A DPIA is mandatory under Article 35 when processing is "likely to result in a high risk" to individuals. AI systems that profile individuals, process special category data, make automated decisions with legal effects, or monitor publicly accessible areas on a large scale all trigger the DPIA requirement. The CNIL explicitly recommends DPIAs for AI systems processing personal data.

What happens if I violate GDPR with AI and the EU AI Act simultaneously?

You face penalties under both frameworks. GDPR fines reach up to €20 million or 4% of global annual turnover. The EU AI Act adds fines of up to €35 million or 7% of global turnover for prohibited practices violations, and up to €15 million or 3% of turnover for other non-compliance. Several member states have assigned their DPA as the AI Act authority, meaning a single investigation could result in enforcement actions under both regulations.

Does self-hosted AI eliminate all GDPR concerns?

Self-hosted AI eliminates data transfer risk and third-party processor risk - two of the largest GDPR compliance gaps. However, you still need a lawful basis for processing, transparency notices, a DPIA where required, data minimization practices, and Article 22 safeguards for automated decisions. Self-hosting solves the architectural problems; the procedural obligations remain. The difference is that with self-hosted infrastructure, those obligations are within your direct control.