It is Not Impossible to Produce Native File Gmail

A Court recently held that it was not “impossible” to produce Gmail in native file format. The more accurate holding might have been “in a reasonably useable form.” Keaton v. Hannum, 2013 U.S. Dist. LEXIS 60519, at *4-5 (S.D. Ind. Apr. 29, 2013).


Here is the relevant section from the case:

Zook has argued that she cannot produce her Gmail files in a .pst format because no native format exists for Gmail (i.e., Google) email accounts. The Court finds this to be incorrect based on Exhibit 2 provided by Zook in her Opposition Brief. [Dkt. 92 at Ex. 2 (Ball, Craig: Latin: To Bring With You Under Penalty of Punishment , EDD Update (Apr. 17, 2010)).] Exhibit 2 explains that, although Gmail does not support a “Save As” feature to generate a single message format or PST, the messages can be downloaded to Outlook and saved as .eml or .msg files, or, as the author did, generate a PDF Portfolio — “a collection of multiple files in varying format that are housed in a single, viewable and searchable container.” [Id .] In fact, Zook has already compiled most of her archived Gmail emails between her and Keaton in a .pst format when Victim.pst was created. It is not impossible to create a “native” file for Gmail emails.

Keaton, at *4-5.

Bow Tie Thoughts

I contacted my friend Charlie Kaupp at Digital Strata for his thoughts on producing gmail in native file format. Here is what he said:

– Downloading to Outlook won’t result in a true native collection, but rather a copy of the native, so should be accounted for as a copy.

– Downloading to Outlook requires the user’s login and password, so is not a viable option for uncooperative custodians or for long lists of collections.

– There are other tools that will allow you to create a direct IMAP connection and download that directly into other formats with full logging, which may be more defensible than the Outlook option.

– PDF portfolios are not native copies and will result in lots of lost metadata. Converted PST or MSG is the best for preserving metadata.

Webmail creates many challenges on how to collect the relevant ESI. Consulting with an expert is always a good plan to determine a course of action that is proportional to the case. Many things lawyers believe to be “impossible” are well within the means of a data collection expert.


Josh Gilliland is a California attorney who focuses his practice on eDiscovery. Josh is the co-creator of The Legal Geeks, which has made the ABA Journal Top Blawg 100 Blawg from 2013 to 2016 and was nominated for Best Podcast for the 2015 Geekie Awards. Josh has presented at legal conferences and comic book conventions across the United States. He also ties a mean bow tie.

  1. I agree with your expert that compiling email to PDF is not the best choice. We had recent experience with opposing counsel who converted their client’s email from native format to PDF before production. They also didn’t produce attachments to those emails in native format, and later claimed the truncated PDF version was the “true document”. After much haggling and threats to seek a court order, the original native format file was disclosed. The original email with its original attachment (a series of nested files) said much more than the truncated PDF version. It was a game-changer in this case. Always insist on native format even if PDF is produced for the convenience of counsel.

  2. Ah, Native! First, thanks, Josh, for the pointer to the decision. I hadn’t seen it. Anyway…

    Ah, native. When we speak of e-mail, hardly anyone produces in the genuine native form of e-mail; but, that doesn’t mean that some forms aren’t so close to the native (so “near-native”) that they can’t be reasonably referred to as native in the way the Court does so. The scale along the native messaging continuum is measured in functionality and completeness. “Functionality” as in, “Are the various components of the message produced as fielded data (so as to facilitate fielded searching, as the native does)?” “Completeness” as in, “Is the base-64 encoded MIME content of the attachments furnished?” or “Is the header data–with its UTC time values and unique message_ID–present?” Oh, It’s not? Then it’s not what I’d call “near-native.”

    There is so much confusion about native when it comes to e-mail. No one produces the true native form of Exchange-hosted e-mail (as in, “Here is a copy of our entire enterprise EDB file from our Exchange server”). If we export Exchange to Outlook (i.e., to a PST container file), that’s near-native (though we routinely call it native for convenience), and its not really, truly complete because there’s more and different metadata within the Exchange environment. But, who cares? You’re getting all you could possibly need for virtually any litigation context, and (most importantly) you’re getting a nearly complete measure of functionality and completeness.

    The native form of any message that traverses the internet must conform to a standard called RFC 2822. Arguably, the “native” form of any e-mail is its RFC 2822 transmission; yet, when that message arrives, other information is grafted on by e-mail clients (like folders and flags) that these, too, may need to be considered as components for a “native” production; i.e., native to the client application.

    The native form of Gmail is a massive database in a Google data center someplace (or in many places). If the goal is merely to merely to win an idiotic argument, then I would have to concede that no one can produce Gmail is its “native” format. Impossible? No. Infeasible? Yes. Again, Gmail is a giant database in the cloud residing in a storage environment few litigants could afford to collect, let alone use. If you don’t believe me, visit a Best Buy store and ask to buy a 100 petabyte hard drive. You could argue that a copy of anything isn’t native because it no longer resides in its native environment. But, we are talking about law, not philosophy. If a tree falls in the forest, I don’t give a damn unless I need firewood.

    There are a range of options for preserving a substantial measure of the functionality and completeness of Gmail. One would be to produce in Gmail. HUH?!?! Yes, you could conceivably open a fresh Gmail account for production, populate it with responsive messages and turn over the credential to the requesting party. That’s probably as close to native as you can get (yes, some metadata will change); but, it’s not what most people expect or want. Alternatively, an IMAP capture to a PST format (using Outlook or a forensic collection tool) is an eminently practical alternative. What you get will not look or work exactly like Gmail (i.e., messages won’t thread in quite the same way and flagging will be a bit different); but, it will supply a large measure of the functionality and completeness of the Gmail source. Plus, it’s a form that lends itself to many downstream processing options.

    Where e-mail is concerned, we should be less hung up on the term “native” and instead specify the actual form or forms we seek that are best suited to what we need and want to do with the data. That means understanding the differences between the forms, not just trotting out a buzzword like “native.”

    P.S. I would be loathe to call a PDF Portfolio a “native” or “near-native” form of production for e-mail. It’s a text searchable imaged production; but, the data tends not to be fielded, and its functionality suffers, especially with respect to the ability to migrate the data as messaging and attachments between platforms. That said, very sophisticated users might appreciate that a PDF is capable of holding binary content; that is, it’s possible to put “native” or “near native” forms *inside* of a PDF container (though no one currently does so in any production environment).

    1. Craig, Thank you for the detailed and thoughtful reply. If you want to do podcast on this topic, just say the word.

  3. I would be interested in the tools that Mr. Kaup is referring to when he says, “There are other tools that will allow you to create a direct IMAP connection and download that directly into other formats with full logging, which may be more defensible than the Outlook option.” The Outlook solution does work but one needs to be careful to make sure they use the setting, “keep a copy on server” or else all of the custodian’s emails will be downloaded and REMOVED from their inBox. Also the Outlook solution can sometimes loose any folder structure that the custodian had in their GMail account. Good information Josh.

    1. Tim:

      All good points. An IMAP collections (versus a POP3) collection *should* be able to capture the Gmail folder structure along with the message content. POP3 is a big issue for Yahoo mail. I find that I often have to pay $20 to upgrade the account to IMAP to get a foldered Yahoo capture. I hate having to my fingers on an account to that extent.

      As to Mr. Kaupp’s point about needing a user’s login and password, that’s always going to be the case for webmail absent the (very-hard-to-get) assistance of the host ISP.

  4. One other option is to use Google’s Vault service. This has an e-discovery component that allows you to put custodians on litigation hold, filter their messages and download and export of the custodian mail store in MBOX format. Benn testing it a bit this week and it seems ok – would be interested to see how well it can scale over large data sets.

    There are also tools such as F-Response with a cloud connector component which will allow you to interface with the Gmail servers and retrieve mailboxes.