To Use Search Terms Before or After Predictive Coding

Here is a simple question with a complex answer: Should search terms be used before or after predictive coding? That question was the subject of dueling motions in In re Allergan Biocell Textured Breast Implant Prods. Liab. Litig. (D.N.J. Oct. 25, 2022, No. MDL No. 2921) 2022 U.S.Dist.LEXIS 200790, at *65-66.)

The Producing Party (Defendants) took the position for the remaining discovery that applying predictive coding after search terms was the best review methodology for the unreviewed sixty custodial files totaling approximately 560,000 custodial records equaling more than 3.5 million pages. In re Allergan Biocell, at *67.

The Plaintiffs argued the Defendants should apply the predictive coding model to the full dataset before search terms, because it created no additional burden and it would increase the accuracy of the review. Moreover, applying search terms would eliminate data to from the predictive coding model, thus decreasing the accuracy of the review. 

The Defendants argued that case law supported the workflow of using predictive coding and search terms. The Court rejected that reading of the cases, stating that courts had not settled on the question of what workflow is best for using technology-assisted review and search terms. In re Allergan Biocell, at *70. 

The Court rejected the Defendants’ argument for undue burden that applying the predictive coding model to the full dataset collected (nearly 10 terabytes of data) would require the Defendants to recollect data. The Court bluntly asked, where did the collected data go? Why do you need to collect again? Any collected data should have been preserved, so there should not be a need to recollect data. In re Allergan Biocell, at *72-73. 

The Court could not answer the question of what was the better workflow, applying search terms before or after predictive coding from the evidence presented in declarations. In re Allergan Biocell, at *72-73.

The Court rejected the Defendants’ proposed workflow and concluded the order with the following: 

We have little doubt that the parties knew at the outset the costs of ESI discovery would be high, and the review process would be extensive. The fact is, without testing on an agreed set of documents, no one can predict whether the application of TAR with or without search terms is the more economic and feasible way to proceed. Implementing TAR, at this stage, after the application of search terms, opens the door for potential disputes that may arise related to the accuracy of the review process and will further delay the completion of discovery and drive costs upward. Finally, applying TAR to an already reduced (via search terms) set of documents will reduce further the identified responsive documents and will certainly not reveal documents that the application of search terms has precluded. Because Plaintiffs did not bargain for this at the outset, over a year ago, it is inappropriate to force them to accept it now.

Bow Tie Thoughts 

The question of whether to use predictive coding before or after search terms is a tricky one. A predictive coding model can be made by reviewing a sample of data. If the software uses continuous active learning, the predictive coding is learning from the reviewers on what is relevant or irrelevant. However, models can be created from specific searches. For example, one method of training a predictive coding model for attorney-client privileged communications is to create a search for email between the client’s email domain name and the law firms that represent the client. This model is born from search terms. However, it can be applied to data outside of the search terms to identify different law firms that represented a client in the past. 

A possible workflow in the above case could be applying a predictive coding model to the search hits from a custodian’s email account and limited by date. That could be a perfectly valid search methodology, because it focuses on what is likely to be relevant, opposed to hoping to find what is relevant with data from the predictive coding model.

The wildcards in this case are not knowing the type of data at issue, the proposed search terms, the type of predictive coding being used, and the requests for production at issue. It is radically difficult to say what is the “best” workflow for identifying responsive records.Given these wildcards, the Court made the best call with the information available.