A little while back, Craig Ball wrote an article, “E-Mail Isn’t as Ethereal as You Might Think” for Law Technology News which described some high level basics of the MIME Internet mail format standard. Much more technical than the typical LTN article, it highlighted the need for more articles and discussion on the ESI itself. In that vein, here is the first of several articles discussing and examining different email formats. Keep in mind that processing email for E-Discovery may be best performed by legally sound, email management products that have been verified by leading major, independent, third-party litigation consultants.
This isn’t just geek stuff. It’s lawyer stuff, too.
- Craig Ball
Major Email Types Encountered in E-Discovery
Here is a short introduction to the major types of email encountered in E-Discovery.
- Internet (MIME/mbox): Virtually all, if not all, mail servers today can handle MIME format email. Open source mail servers often use MIME as their default email format for sending email within the environment and out to users of other mail servers while servers like Exchange and Domino send / receive MIME when communicating outside their deployment. MIME is an open standard defined by the Internet Engineering Task Force (IETF) in several Request for Comments (RFCs). The email format itself is described in RFC-5322. Mbox files are container files for MIME format messages. The basic format is a text file comprising a concatenated list of MIME messages with a special “From line” to delineate the start of each message.
- Microsoft (MSG/PST,MIME/EML): Microsoft Outlook’s native email format is MSG, a file format described in MS-OXMSG. End-users often deal with Personal Storage Table (PST) files more often than MSG files; however, many E-Discovery practitioners are familiar with MSG files which often get included with native productions. End-users can generate MSG files by dragging email from Outlook and dropping it on to Desktop or other file system area. PST files are container files for MSG format files. While Microsoft Outlook does not support MIME email, you can read it using Microsoft Windows Live Mail (WLM) or Outlook Express. Simply ensure the MIME mail has the .EML file extension and open it in WLM or Outlook Express.
- Lotus (Notes CD/NSF,DXL): Before MIME was established, Lotus created their own proprietary rich data format, called Notes Compound Document (aka Notes CD, Notes Rich Text). NSF files are container files for Notes CD format messages. In Lotus 6 and later, Lotus mail can also be exported as DXL objects.
Email Types in the EDRM Enron Email Data Set 2.0
To get a full appreciation for the different email formats, it’s useful to take a look at some email in the different formats. The EDRM Enron Email Data Set 2.0 supports multiple formats which can be explored. The email was produced by ZL Unified Archive® which can archive / collect / manage email in the various native formats and convert between the various formats as well.
- EDRM XML: This is the open standard E-Discovery load file standard as defined by the EDRM XML working group. The EDRM XML files in this data set include ESI metadata along with native email in MIME format (with attachments) and extracted native attachments as well as text extracts.
- MIME: While the MIME files are included in the EDRM XML distribution, it is possible to access the MIME without reading the EDRM XML. This has been useful for some research organizations.
- PST: All of the email is also produced as PST files for the custodians. These files can be read directly in Microsoft Outlook or processed by virtually all archives and E-Discovery tools.
Email Types in the EDRM Internationalization Data Set
The EDRM Internationalization Data Set provides email in an additional format:
- mbox: Mbox files are available in the following languages: Email in the following languages is included: Arabic, Catalan, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, Turkish.
I anticipate writing a few more articles on this topic exploring each of the different types of email. It is my hope that layers and other E-Discovery specialists will be able to “grok” email a bit more through these posts.
If you are interested in learning more about these email formats, how to manage them in your enterprise, and how to migrate between them, consider contacting ZL Technologies. ZL Unified Archive® can not only manage email on Exchange, Domino, and Internet mail servers, but it can also migrate email between the different formats.
Image courtesy of: UK Pay Day Loans.