Skip to content

Our Privacy Statement & Cookie Policy

All Thomson Reuters websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.

Artificial Intelligence

Retrieval-augmented generation in legal tech

James Ju  

· 12 minute read

James Ju  

· 12 minute read

Introduction to retrieval-augmented generation (RAG). Why data is the new gold while content is still king

← Blog home

 

Highlights: 

  • RAG is a method of using AI to generate text by first retrieving relevant documents and then using those documents as inputs to the AI.
  • RAG improves the accuracy and reliability of AI-generated text, especially in domains where there is a lot of specialized knowledge, such as law.
  • Legal research tools using RAG should be carefully evaluated to ensure that they are using high-quality data and that the models are being tested and benchmarked for accuracy

 

Once ChatGPT took the world by storm, the legal industry was one of the first professional fields that began to reap its benefits. Why was that? Tapping into the large corpus of text found in documents, contracts, cases and other primary and secondary sources of material was tantamount to striking gold in the Generative AI (GenAI) race for legal research, writing, and more mundane legal tasks.

The use of GenAI backed by domain-specific retrieval-augmented generation (RAG) enables a rich level of nuance and expertise for specialized fields such as law. This is the single biggest differentiator for a more trustworthy and professional-grade legal AI assistant.

Jump to ↓

Context and subsets of AI

What is RAG?

Retrieval of gold-standard of legal content

RAG reliability

 

Context and other AI terms

Important concepts and terms need to be defined before even getting to the significance and implications of RAG’s application in legal work.

Artificial Intelligence (AI)

AI is the broader field of developing computer systems capable of human-like intelligence. AI itself is the idea that computers can simulate human intelligence. On the one hand, computers enable the artificial side. On the other hand, algorithms represent human intelligence and hence simulate or imitate intelligence.

Logical behavior, decisions, and rules are at the core of human intelligence. This is where it gets interesting for attorneys and legal professionals since they need to follow a set of rules and make logical decisions in order to practice law effectively.

Similarly, AI algorithms are designed to follow logical rules and make decisions based on data and patterns. This allows AI to perform tasks that would typically require human intelligence, such as problem-solving, learning, and decision-making.

Motion graphic of TR website messaging on laptop screen— Transforming tomorrow's/today's work with AI

 

Machine learning (ML)

Machine learning is a particular subset of AI that involves training models and computer algorithms to make predictions or decisions without rule-based programming. This allows AI to continuously learn and improve from ingested data to make more accurate decisions similar to how humans learn from experience.

Now to supervised vs. unsupervised learning. Supervised learning involves providing the model with labeled data and desired outcomes, while unsupervised learning involves allowing the model to identify patterns and make decisions on its own. Supervised learning is about predicting values and variables. Unsupervised learning is about unstructured data and making sense of the raw data.

Both approaches have their own advantages and are used in different scenarios depending on the type of data and desired outcome. We can see how and why this affects the legal profession and the debates around legal tasks and even work-product.

Deep learning

Deep learning is a subfield of machine learning that involves training artificial neural networks with many layers (deep neural networks) to perform tasks such as image and speech recognition.

Natural Language Processing (NLP)

Natural Language Processing (NLP) enables computers to comprehend, interpret, and create human language, bridging the gap between computers and human communication.

Generative AI (GenAI)

GenAI is any model that produces flexible outputs — images, text, audio, etc. — as opposed to discriminative AI which deals with classification and regression models. ML has allowed us to understand, hence learn, so GenAI can create. Therefore, GenAI is the technology based on the broader, previously defined terms that enables the ability to generate human-like text, images, and even videos.

CoCounsel Drafting screenshot in Word

 

Large Language Models (LLM)

LLMs form a bridge between our queries and what it generates. While traditional LLMs only deal with text inputs, more recent multimodal models can handle multiple input types like images, video, and audio.

LLMs are trained on large amounts of data — usually on billions of trainable parameters. They are built from deep learning and machine learning models and are made of complicated formulas that predict the next word.

Overall, LLMs have the potential to greatly assist with legal tasks by automating and expediting processes that would typically require human effort. However, it is important to carefully consider the training data and potential biases and hallucinations when implementing LLMs in legal settings.

Retrieval-augmented generation (RAG)

Finally, RAG is the processing of raw data that truly sets a professional-grade LLM apart from others. This process is called grounding in which an LLM is augmented with industry-specific data that is not part of the development process for mass market LLMs.

Rather than having the LLM answer a question based on its own memory, it first retrieves relevant documents from a search engine and then uses those documents as inputs to the LLM in order to ground the answer.

For the legal field, it means gathering and preprocessing legal documents, prompt engineering, and intense human evaluation to improve specific tasks such as contract analysis or legal document summarization. This allows for a more efficient and accurate analysis of legal documents, leading to potential time and cost savings for legal professionals.

Retrieval of gold-standard legal content

Content is indeed still king, and data is the new gold. However, it’s the quality of that data and how it’s managed that becomes the critical factor.

There are many legal research tools on the market using LLMs. However, legal tech specialists need to ask the right questions and understand just exactly what kind of content sources and training data its LLM retrieves for every query.

As a side effect of their training, LLMs often tend to please, and if they don’t know the answer offhand, they may make something up in an attempt to be helpful. RAG can mitigate this by providing useful context to help answer questions, similar to an open-book quiz, thereby grounding an LLM’s answer and reducing the risk of hallucinations.

Shang Gao

Lead Applied Scientist, TR Labs

Westlaw content

Westlaw has always been the standard for legal research because of its content and proprietary editorial enhancements like the West Key Number System and KeyCite.

In the West reporting system, an attorney-editor reviews each case published. The lawyer-editor finds and summarizes the legal points in the case. These summaries, known as headnotes, are placed at the beginning of the case and typically consist of a paragraph.

Each headnote is assigned a specific topic and key number, and they are organized in multi-volume books called Digests. These Digests act as subject indexes for the case law found in West reporters. It’s important to note that headnotes are editorial aids and do not serve as legal authority themselves.

AI-Assisted Research GIF showing legal question field with references

 

For instance, AI-Assisted Research on Westlaw Precision focuses the LLM on the actual language of cases, statutes, and regulations. It doesn’t ask the LLM to generate answers based solely on the question asked but rather the content it searches. It finds the very best cases, statutes, and regulations to address the question, as well as the very best portions of those cases, statutes, and regulations.

Studies have shown that poor retrieval and/or bad context can be just as bad as or worse than relying on an LLM’s internal memory — just as a law student using outdated textbook will give wrong legal answers, an LLM using RAG without good sources will generate unreliable content. That’s why the Westlaw and CoCounsel GenAI solutions are so dependable — they are backed by the largest and most comprehensive legal libraries available.

Shang Gao

Lead Applied Scientist, TR Labs

Practical Law content

Practical Law provides trusted guidance, checklists and forms that help attorneys practice law effectively, efficiently, and ultimately with less risk.

The reliability of this data is dependent on the people responsible for its labeling, structure, and annotations.

The team of over 650 legal expert editors are highly qualified and have practiced at the world’s leading law firms, corporate law departments and government agencies. Their full-time job is to create and maintain timely, reliable, and accurate resources to ensure they have a great starting point with their legal matters such as new legal realities, legislative changes, and relevant practice areas.

Practical Law product on laptop

 

Whether it’s a crisis, an unfamiliar matter, or an ever-evolving issue, they provide comprehensive insight and answers to your “how do I” questions.

RAG reliability

Yet even the gold standard of the most trusted and reliable sources of legal content falls short without a robust testing and benchmarking process. In other words, what constitutes a thoughtful and methodical process to ensure more trust, reliability, and accuracy?

Testing benchmarks

Our research products are composed of highly complex LLM flows and prompting techniques supported by the latest research ideas. In order to ensure our products are optimized and effective for any and all users, we need comprehensive benchmarks that cover all the use cases that lawyers may come across.

This is precisely why a Thomson Reuters benchmarking and evaluation team stressed the importance of both retrieval and generation components in legal AI systems.

For a RAG-based system, this means ensuring that the initial document retrieval is accurate and relevant, as it directly impacts the quality of the generated output. Legal tech specialists should therefore thoroughly analyze and weigh the benefits of both RAG and LLM components when considering legal-specific GenAI assistants.

For example, the Search a Database skill first uses various non-LLM-based search systems to retrieve relevant documents before the LLM synthesizes an answer. If the initial retrieval process is substandard, the LLM’s performance will be compromised.

Jake Heller

Head of CoCounsel, Thomson Reuters

CoCounsel’s Trust Team recognizes the subjective nature of legal tasks and the variability in what constitutes a correct answer. The best way to address this is embodied in their decision to not only test and benchmark, but also release performance statistics and sample tests.

 

The testing process aims to simulate real attorney tasks. These tests are based on insights, customer feedback, and secondary sources. An attorney tester manually completes the test to establish an “ideal response,” which is peer-reviewed.

This ideal response sets the benchmark for passing scores. The Trust Team then uses CoCounsel’s skills to perform the task, generating a “model response” which is compared to the ideal response. Differences are assessed to determine if they render the output incomplete, incorrect, or misleading.

Tests can be failed for various reasons, even if the answer isn’t outright wrong, especially for subjective skills like summarization. Evaluation instructions are developed to align LLM performance with human reviewers.

Four core skills were tested and below are some of the results.

SkillPass Rate ¹
Extract Contract Data98.8%
Review Documents96.6%
Search a Database95.6%
Summarize90.6%

 

¹ Derived from datasets ranging from 89 to 98 test cases. Zero hallucinations were identified among failing tests. Please note that the Summarize skill is inherently subjective, where two attorneys may disagree on the level of correctness of the answer. As such, it is important to highlight that the tests in the dataset for this skill can be failed for many reasons – including where the answer is missing a detail that the tester considers to be a key detail. This does not mean that the answer is outright wrong.


The team is committed to continually monitor and refine the skills test by manually reviewing failure cases from the automated tests and spot-checking passing samples to ensure the automated evaluation aligns with human judgments.

The entire process underscores the importance of transparency in building trust with users. See the full Legal AI benchmarking results for further reference.

Link to post about AI adoption

 


James Ju is an SEO and marketing professional. He is currently pursuing an AI for leaders post-graduate certification in partnership with UT Austin’s McCombs School of Business and Great Learning.

Shang Gao is a lead machine learning researcher at Thomson Reuters, where he designs, develops, and deploys solutions for legal and transactional language understanding, generative question answering, and knowledge retrieval. His recent work includes the development of CoCounsel, Casetext’s AI legal assistant based on OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro, demonstrating that GPT-4 can pass all portions of the Unified Bar Exam, and running evaluations for Stanford’s new LegalBench benchmark for evaluating performance of large language models on legal reasoning.

Prior to Casetext, Shang was a research scientist at Oak Ridge National Laboratory, where he led a research team building clinical AI solutions for the National Cancer Institute. Shang has a PhD in Data Science from the University of Tennessee.

More answers