Misadventures in Searching Unallocated Space

The parties in I-Med Pharma Inc. v. Biomatrix, Inc., stipulated to the search of the Plaintiff’s computer system by the Defendants’ forensic expert.  I-Med Pharma Inc. v. Biomatrix, Inc., 2011 U.S. Dist. LEXIS 141614 (D.N.J. Dec. 9, 2011).

The expert ran a combination of approximately 60 keywords, including French words, keywords expanded as wildcards and Boolean connectors, across all of the data on the computer system.  This included unallocated space.  The search was not targeted to specific custodians, relevant time periods or active files.  I-Med Pharma Inc., at *5-6.

Examples of the search terms included:

back order*

product*

profit*

HS*

I-Med Pharma Inc., at *5-6.

In the words of the Court:

The results should come as no surprise. The broad search terms hit millions of times across the large data set. In the unallocated space alone, the terms generated 64,382,929 hits. These hits represent an estimated 95 million pages of data.

I-Med Pharma Inc., at *6.

The opinion does not discuss whether any de-duplication, near-de-duplication, de-NISTing or other data reduction methodologies were applied to the keyword “hits.”

The Court expressed its concern in a footnote over the parties referring to keyword “hits” as separate documents.

As the Court explained:

Given the volume of hits and search terms used, this is essentially impossible—statistically speaking terms like “profit,” “loss,” “revenue,” and “profit” frequently occur together, and it stands to reason that at least some files mentioning product lines would make reference to more than one at the same time. Consequently, the Court is left to wonder whether the total hit and estimated page numbers are genuinely correct.

I-Med Pharma Inc., at *7, fn 4.

The Plaintiff was not thrilled at the idea of conducting a privilege review of the large data set.  The Magistrate Court agreed, issuing an order: 1) Allowing the Plaintiffs to withhold ESI from the unallocated space and 2) Permitting the Defendants to seek reimbursement for their search from the Plaintiff.  I-Med Pharma Inc., at *7.

The Magistrate Judge also found:

1) Good cause existed to modify the original discovery order, because the burden on the Plaintiffs would “outweigh any potential benefit that may result.”

2) Defendants had not met its burden of demonstrating the complete relevancy of the ESI they sought, including that the Defendant had not identified any ESI destroyed by the Plaintiff. 

3) The overbroad search terms made the likelihood of finding relevant information that would be admissible at trial “minimal.”

I-Med Pharma Inc., at *7-8.

The Defendants appealed the Magistrate Judge’s order.  The District Court affirmed.  I-Med Pharma Inc., at *8, 18.

The Defendants had the difficult task of demonstrating the Magistrate Judge’s order was “clearly erroneous or contrary to law.” I-Med Pharma Inc., at *8.

The District Court described the Magistrate Judge’s order as “reasonable exercise of…discretion in managing the scope of permissible discovery.” I-Med Pharma Inc., at *11.

The Defendants argued the Magistrate Judge applied the wrong standard to granting the relief from the original stipulation.  The Defendants argued “exceptional circumstances” were required to grant relief from the stipulation, citing a case involving a stipulation on liability.  I-Med Pharma Inc., at *11-12.

The Court explained that the scope of discovery is very different than a party attempting to withdraw an admission of wrongdoing post-trial.  I-Med Pharma Inc., at *13.

As the Court stated:

During discovery, the parties are still actively uncovering the evidence needed to bring a case to trial and have ample opportunity to modify and adjust their litigation strategy to any important developments. Clearly a court has the power to modify stipulations concerning discovery terms and deadlines while discovery is still ongoing without the showing of manifest injustice. A court could not effectively perform its duty to fairly and efficiently manage discovery if every minor change to a stipulated briefing schedule or deposition date required a showing of “exceptional circumstances” or “substantial and real harm.” While courts should not casually discard agreements between the parties, nor should they abrogate their duty to balance both burden and the likelihood of uncovering relevant evidence merely because a party made an improvident agreement.

I-Med Pharma Inc., at *13-14.

Even if the “clearly erroneous” standard applied, attorneys reviewing potentially 65 million “hits” (or 95 million pages of data) for privilege would be very expensive.  I-Med Pharma Inc., at *14-15.

The Court also rejected the Defense argument that privilege review could be limited by simply searching for the word “privileged.”  The Court explained the problems with such a search:

Even when dealing with intact files, potentially privileged information may often be found in emails, memoranda, presentations, or other documents that are not explicitly flagged as privileged or confidential. And since the data searched here is likely to contain fragmented or otherwise incomplete documents, it is entirely possible for privileged material to be found without its original identifying information.

The Court upheld the Magistrate Judge’s order, closing the opinion with a cautionary message on search terms:

While Plaintiff should have known better than to agree to the search terms used here, the interests of justice and basic fairness are little served by forcing Plaintiff to undertake an enormously expensive privilege review of material that is unlikely to contain non-duplicative evidence.

I-Med Pharma Inc., at *17-18.

Bow Tie Thoughts

Parties get into trouble with search terms all the time. Moreover, a party may agree to a search methodology that they later regret. This can happen in multiple stages of discovery, whether it is collection, early case assessment, processing or discovery review.  Each stage involves using “search” technologies, but a search string used in a review platform to find ESI for a deposition might be too narrow at the collection stage.

While the facts of case will control whether unallocated space needs to be searched, the prospect of conducting a privilege review of millions of files the “old fashion way” is mind numbing.  It would be like trying to find a needle in a swimming pool of needles.

If extremely large data sets need to be reviewed, mechanical analytics is one way to expedite review and control discovery costs.  There are multiple products on the market with “predictive coding” abilities that learn from reviewers which ESI that is responsive, thus “machine coding” the files as “relevant” or “privileged.”

The effectiveness of this technology will turn heavily on who does the initial review, whether it is a combination of attorneys who understand the subject matter of a case; information from the parties on how they communicated and the language used in the case; types of relevant ESI; search terms or concepts agreed to by the parties in a meet and confer; plus a host of other factors.

I am sure there are eDiscovery attorneys and Magistrate Judges eagerly awaiting the right case were this technology has been properly used; the methodology documented; and declarations from the software developers explaining the science of the algorithms.  When that day happens, one judge can issue an opinion validating the use of mechanical analytics in identifying responsive electronically stored information. 

Attorneys will always be needed to decide what ESI to use in a deposition or trial, because a human being is better at determining what will convince other human beings the “truth” of a case.  However, technology can make finding what is relevant out of a data set with 65 million records far more effective than a brute force review of each record.

Please vote for Bow Tie Law in the ABA Journal Blawg 100in the IP Law category.