Safety Policies for Legal Use of Generative AI: Lessons from Mata v. Avianca

Imagine spending hours drafting a brilliant legal brief, only to have it rejected because the cases you cited never existed. This isn't a hypothetical nightmare; it happened in real life and changed how lawyers use technology forever. The case is Mata v. Avianca, a landmark 2023 federal court case where attorneys were sanctioned for submitting fabricated legal citations generated by ChatGPT. It serves as the ultimate warning label for anyone relying on artificial intelligence in high-stakes professional work.

In this article, we break down exactly what went wrong, why large language models (LLMs) make up facts so confidently, and how you can build bulletproof safety policies for your team. We will move beyond vague advice like 'be careful' and give you specific protocols, tools, and checklists used by top law firms today.

The Mata v. Avianca Incident: A Case Study in Failure

To understand the risk, we first need to look at the mechanics of the failure. In 2022, Roberto Mata sued Avianca airline for a knee injury sustained on a flight. The airline moved to dismiss the case, arguing it was time-barred under the Montreal Convention. The defense was strong procedurally, but the plaintiff’s attorneys, Peter LoDuca and Steven Schwartz, needed precedents to argue that the statute of limitations should be tolled (paused).

Instead of using standard legal research databases, attorney Steven Schwartz turned to ChatGPT, an early version of OpenAI's conversational AI model trained on internet text through 2021. He asked the AI for relevant case law. ChatGPT generated six detailed cases, complete with procedural histories and judicial holdings that perfectly supported Mata’s argument. Cases like Martinez v. Delta Air Lines and Zicherman v. Korean Air Lines sounded authentic. They had correct-sounding names, plausible dates, and logical reasoning.

Here is the critical error: Schwartz did not verify these citations. He assumed that because the AI presented them with confidence, they were real. When he filed the opposition brief in May 2023, Avianca’s lawyers immediately noticed something was off. They couldn’t find any of these cases in Westlaw or LexisNexis. Judge P. Kevin Castel issued an order to show cause, and upon investigation, it became clear the cases were entirely fabricated.

The consequences were severe. On May 26, 2023, the judge imposed $5,000 in sanctions against each attorney, payable to the court registry. More importantly, the case established a new precedent: attorneys are responsible for verifying all AI-generated content before filing it. This wasn't just a tech glitch; it was an ethical breach of professional responsibility.

Why Did This Happen? Understanding Hallucination Risk

You might wonder, "How could an advanced AI make such a basic mistake?" The answer lies in how Large Language Models (LLMs), AI systems designed to predict the next word in a sequence based on statistical probability rather than factual truth. work. Unlike a search engine that retrieves stored data, an LLM generates text by predicting what sounds most likely given the context.

This architecture creates a phenomenon known as hallucination. When asked for specific, verifiable facts-like a court case name-an LLM doesn't check a database. It constructs a response that mimics the structure and tone of legal writing. If the training data contains fragments of similar cases, the model stitches them together into a plausible but false narrative.

Research from Stanford University's Center for Research on Foundation Models highlights the scale of this issue. Their studies show that LLMs hallucinate factual information in 15-20% of responses when dealing with domain-specific questions outside their core training data. For precise citation requests, accuracy drops even further. In a June 2023 study by the University of Chicago Law School, ChatGPT-4 generated completely fabricated cases 72% of the time when asked for specific legal precedents.

The danger is compounded by the AI's tone. As one New York litigator noted in a Clio review, "ChatGPT's tone mimics legal authority so well that junior associates don't question its outputs." This is called automation bias-the psychological tendency to trust machine output over human judgment. Without explicit safeguards, this bias leads professionals to skip verification steps, assuming the tool knows better than they do.

Monstrous digital entity weaving fake legal citations in a void

General-Purpose AI vs. Specialized Legal Tools

Not all AI is created equal. The root cause of the Mata incident was using a general-purpose chatbot for a task requiring specialized, verified data. To build effective safety policies, you must distinguish between these two categories:

Comparison of General-Purpose AI vs. Specialized Legal AI
Feature	General-Purpose AI (e.g., ChatGPT)	Specialized Legal AI (e.g., Westlaw Precision)
Data Source	Internet text (unverified, cut-off date)	Verified legal databases (40,000+ sources)
Citation Accuracy	Low (~28% for specific cases)	High (>99.8%)
Verification Mechanism	None (statistical prediction)	Editorial oversight + database cross-reference
Primary Use Case d>	Drafting, brainstorming, summarization	Legal research, citation generation
Risk Level	High (Hallucination prone)	Low (Grounded in primary sources)

Tools like Westlaw Precision, Thomson Reuters' AI-powered legal research tool launched in 2020 and Lexis+ AI, LexisNexis's generative AI platform released in October 2022 operate differently. They are built on "walled gardens" of verified content. When you ask Westlaw for a case, it doesn't guess; it searches its index of 1.3 billion searchable documents and returns actual links to primary sources. Internal validation studies show citation accuracy rates exceeding 99.8% for these platforms.

The lesson here is simple: Never use a general-purpose LLM for tasks that require verifiable facts. Use them for drafting memos, brainstorming arguments, or summarizing long documents-but always treat the output as a first draft, not a final product.

Building Your AI Safety Policy: A Step-by-Step Guide

So, how do you protect your firm or organization from making the same mistakes as Mata v. Avianca? You need a structured policy. Based on guidelines from the American Bar Association (ABA) and best practices from Am Law 100 firms, here is a practical framework.

1. The Verification Protocol

The single most important rule is: Verify everything. The ABA's Formal Opinion 498 states clearly that lawyers must supervise technology and verify its accuracy. Here is a concrete workflow:

Cross-Reference Citations: If an AI provides a case name, statute, or quote, you must locate the original source. Do not rely on the AI's summary. Check it against Westlaw, LexisNexis, or PACER.
Use the 'Two-Person Rule': Require dual verification for all AI-generated content intended for client delivery or court filing. One person drafts/reviews the AI output, and a second senior attorney verifies the facts.
Document the Process: Keep a memo noting which tools were used and how verification was performed. This protects you in case of future disputes about competence or diligence.

2. Tool Selection Criteria

Not every AI tool is suitable for every task. Before adopting a new tool, evaluate it against these criteria:

Source Transparency: Does the tool provide direct links to primary sources? If it generates text without citing where the information came from, flag it as high-risk.
Training Data Recency: Is the model trained on current data? Legal landscapes change quickly. An AI trained on data from 2021 may miss recent legislative changes.
Domain Specificity: Prefer tools built for your industry. A medical AI won't understand legal nuance, and a legal AI won't understand coding syntax.

3. Training and Culture

Technology alone won't fix bad habits. You need to train your team to recognize automation bias. The New York County Lawyers' Association recommends a minimum 15-minute verification process per AI-generated citation. Train associates to ask: "Does this sound too perfect?" Often, hallucinations are smooth and coherent, while real legal texts are messy and complex.

Consider implementing mandatory continuing legal education (CLE) on AI ethics. The New York State Bar Association has recommended this since late 2023. Ensure every employee understands that using AI is not a shortcut-it's a different workflow that requires more rigorous checking, not less.

Spectral judge condemning attorneys in a gothic horror courtroom

Practical Implementation Checklist

To help you get started, here is a quick checklist for integrating AI safely into your daily operations:

[ ] Define Permitted Uses: Clearly state what AI can be used for (e.g., drafting emails, summarizing contracts) and what it cannot (e.g., generating case citations, providing legal advice).
[ ] Select Verified Tools: Subscribe to enterprise-grade platforms like Westlaw Precision or Lexis+ AI that offer source verification.
[ ] Create an AI Use Log: Adopt a system where employees log significant AI-assisted work. This helps track patterns and identify potential issues early.
[ ] Implement Client Disclosure: Inform clients if AI is being used in their matter. Transparency builds trust and manages expectations regarding cost and quality.
[ ] Conduct Regular Audits: Quarterly reviews of AI-generated work products to ensure compliance with verification protocols.

Future-Proofing Your Practice

The landscape is evolving rapidly. By 2026, the legal AI market is projected to reach $3.8 billion, driven by demand for safer, more reliable tools. Regulatory bodies are catching up. The Federal Judiciary issued Standing Order 24-01 in January 2024, requiring attorneys to disclose AI usage in filings. The Supreme Court is considering amendments to Rule 11 to explicitly address AI-generated submissions.

Firms that ignore these trends face significant risks. Data from ALM Intelligence suggests that firms lacking proper AI safeguards face 3.7 times higher malpractice claim rates compared to those with robust verification protocols. Conversely, firms that implement strong safety measures see an 18-22% competitive advantage through increased productivity without ethical breaches.

The key takeaway from Mata v. Avianca is not that AI is dangerous, but that unsupervised AI is dangerous. With the right policies, tools, and mindset, you can harness the power of generative AI while protecting your reputation and your clients. Treat AI as a powerful intern: incredibly fast and knowledgeable, but one who needs every fact double-checked before it goes out the door.

What exactly happened in Mata v. Avianca?

In Mata v. Avianca, attorneys submitted a legal brief containing six fabricated case citations generated by ChatGPT. The judge sanctioned the attorneys $5,000 each and dismissed the case with prejudice because they failed to verify the existence of the cases through traditional legal research databases.

Why do AI models hallucinate legal citations?

AI models like ChatGPT are Large Language Models (LLMs) that predict text based on statistical probability, not factual retrieval. They lack direct access to verified legal databases and generate plausible-sounding but non-existent cases to fulfill the user's request for a specific format.

Is it safe to use ChatGPT for legal research?

No, it is not safe to use general-purpose ChatGPT for primary legal research or citation generation without rigorous verification. Studies show it fabricates cases in over 70% of specific citation requests. Always use specialized legal AI tools like Westlaw Precision or Lexis+ AI, which are grounded in verified databases.

What are the ABA's guidelines on AI use?

The American Bar Association's Formal Opinion 498 states that lawyers may use generative AI provided they supervise the technology, verify its accuracy against primary sources, and maintain direct communication with clients. All AI-generated content must undergo independent verification before filing.

How can I prevent automation bias in my team?

Prevent automation bias by implementing a 'two-person rule' for verification, requiring manual cross-referencing of all AI outputs, and training staff to recognize that AI confidence does not equal accuracy. Encourage a culture where questioning AI results is rewarded, not penalized.