How to make the e-discovery process more efficient with predictive coding

If you’re in the business of practicing law, you’ve probably heard of predictive coding but still working on how to implement it into your e-discovery process. While predictive coding is a hot buzzword in the industry, it’s not a new concept and yet has not been widely adopted into everyday practices. We’ll explore what you need to know about predictive coding so you can feel confident about exploring the right options that work for your business.

Predictive coding basics

Also referred to as technology-assisted review (TAR) or computer-assisted review (CAR), predictive coding technology is used to find responsive electronically stored information (ESI) documents during a legal case’s review phase. Predictive coding uses artificial intelligence to develop software that continues to learn and make better decisions while significantly expediting the review process, saving time and money.

Predictive coding starts by training software with a seed set of data. A seed set is a sample of documents pulled from the entire group of documents that needs to be reviewed. Next, reviewers code each document as responsive (relevant to the case) or unresponsive (not relevant) and inputs the information into the predictive coding software. The reviewer is often a highly skilled attorney with plenty of experience. As the training continues, AI allows the software to learn and make better, faster decisions as time goes on.

Why is predictive coding not widely adopted?

So, if this revolutionary technology exists that makes reviews easier, faster, and more accurate as well as saves money, why isn’t everyone using it? There are several barriers that have kept predictive coding from being widely adopted in the legal industry. It’s also important to remember, disruptive technology is commonly met with resistance or is slow to implement before becoming established.

First, the technology behind predictive coding is complex. It relies on technology that involves advanced data science and statistical sampling which requires highly specialized skills. Even though the backend technology of predictive coding is complex, the user experience is much simpler and easier to comprehend. Essentially, predictive coding technology is still in its early stages of development and simpler, easy-to-use technology has not become mainstream yet.

Secondly, predictive coding technology is expensive to implement. While the cost of predictive coding is clearly beneficial as a long-term investment, it can be difficult to finance and commit resources to the technology. Predictive coding requires a substantial amount of time and money to develop the software as well as properly train before it reaches its full potential as a beneficial e-discovery tool.

Lastly, court endorsements of predictive coding is still new and risk adverse lawyers are wary of how documents found with predictive coding will be received. Federal Magistrate Judge Andrew Peck's decision in Da Silva Moore v. Publicis Groupe (Southern District of New York, 2012) was the first official judicial endorsement of predictive coding to review documents. Currently, predictive coding is commonly accepted by judges in the use of e-discovery and is expected to become even more frequent as the use of predictive coding expands.

Predicative coding best practices

If predictive coding is new to your law practice, there are several best practices you can follow to make sure you’re getting the most from your software. First, become familiar with the technology. You don’t have to be a statistical coding expert but having a solid understanding of how the system works is important.

Get off to the right start by properly training the system. Predictive coding has a garbage-in, garbage-out application. The software will code correct and incorrect guidance, whichever information is fed from the trainer. Start by carefully selecting a sample of relevant and non-relevant documents. This will be your seed set for training the software. It’s also important to carefully consider your team of reviewers. This should be your most senior attorneys with plenty of experience to make accurate decisions. It’s helpful to train by collaborating with two to three experts as opposed to a single reviewer in order to help ensure the quality of the system and avoid unbiased training.

It’s also important to establish an appropriate relevancy threshold. Knowing how confident the system in a document’s relevance will help your team manage manual reviews of responsive documents. Remember the goal is not to have 100 percent accuracy in your predictive coding because we don’t expect perfect reviews from humans either. Essentially, predictive coding is a tool that allows more time for manual review of a smaller, more relevant set of documents. When humans have a smaller set of documents to assess they’re less likely to make mistakes during their review.

It’s vital to validate the results of your predictive coding software for quality assurance. While it’s okay that predictive coding isn’t perfect, you do want to be sure it’s not missing documents that should have been coded as responsive when the software is coded correctly. You can validate the quality of predictive coding by looking over documents identified as not-responsive. These documents should be searched for keywords used by the system to validate that the software is correctly identifying relevant documents. If your not-responsive documents identified by your predictive coding are indeed, not-relevant you can feel confident your predictive coding software is doing its job accurately.

Will predictive coding take over human jobs?

While there’s a lot of hype around AI and predictive coding taking over people’s jobs, it’s highly unlikely. Predictive coding works because of the human experts training the systems. Predictive coding relies on the attorney’s expertise to make the right decisions and learn from those decisions as time goes on. The AI behind predictive coding is really only augmenting the attorney’s own abilities as it detects patterns that are seemingly invisible to human reviewers. The people behind the program software and manual reviewing of responsive documents will continue to remain vital to the accuracy and success of predictive coding. Essentially, predictive coding is a tool for humans to use to save time and money, and make more accurate decisions during the e-discovery process.