I’m pleased to announce that an initial version of the EDRM Enron Email Data Set consisting of 40GB of PST files with attachments and folder structure is now available within the EDRM project as of the EDRM 2009-2010 Kick-Off Meeting. The EDRM Data Set Project is now working to make this data set publicly available.
This initial data set was created by myself and a team at ZL Technologies; however, more work remains and I think the EDRM Data Set project is an ideal group to head up the effort to publish some industry standard data sets.
Some of the issues that the EDRM Data Set Project will be looking at include addressing privacy concerns, the publishing of smaller data set slices, and distribution methods for large data sets. If you would like to participate in this process, please join EDRM.
EDRM Data Set Project Lead