The EDRM Enron PST files are now available on the EDRM Data Set website thanks to George Socha, EDRM, and ZL Technologies. I am co-lead of the EDRM Data Set project and personally worked on this data set at ZL Technologies so I thought I would provide a brief introduction to this data before our formal description comes out. In the interests of full disclosure, I created the PST files available at EnronData.org as a precursor to the EDRM PST files which are now available. If you have any questions regarding the data set you would like answered, either in the paper or informally, please post to the EDRM Data Set webpage, here, or the litsupport mailing list thread. Alternately, you can send email to firstname.lastname@example.org or myself directly at email@example.com.
As with other publicly available Enron email, this data set originates from a FERC distribution. The FERC distribution contains email from Microsoft Exchange and Lotus Domino email environments that have been processed for eDiscovery through IPRO. A challenge with this data is that it is available as a load file and not as email. The EDRM Data Set project’s research into conversion utilities indicated that many eDiscovery tools are available to convert from email format to load file format but not the other way around. Based on this, ZL created conversion tools to migrate IPRO’s load file format back to email format from which the PST files were created.
Since the email was processed for eDiscovery, there are varying levels of restoration that can be performed beyond simply converting the load file format to email format. Some of these have been implemented in this data set. Some additional steps such as recreating Notes email have been scheduled for future work. There will be a discussion of this in the description paper.
As mentioned above, please send us your questions on this data set so we can answer in our formal description as well as informally beforehand.