Open Archive Formats

An increasingly important aspect of email and file management is the issue of open vs. closed file formats. Open formats are gaining popularity and allow organizations to retain control their own data without the costs often associated with vendor lock-in. The acceptability of high switching costs and sometimes operational costs are giving way to the demand for open standards formats. The debate for file formats has changed from open vs. closed to which open standard to choose in the discussions of ODF vs. OOXML and, to a lesser extent, PDF vs. XPS.

Parallel to the debate for living file formats, the issue of archival formats is just as important and growing in criticality. The importance of archival format is highlighted by the recent changes to the FRCP extending coverage to electronically stored information (ESI) covering email and files. In addition to these legal requirements, organizations are facing ever increasing amounts of ESI that they must manage, often reaching into the billions of records requiring terabytes of storage. As the amount of data increases, the negative consequences of storing ESI in a proprietary, closed format also increases. It is time for vendors and organizations to move to open standard archival formats.

An example

To highlight the importance of this issue, it is instructive to look at an example with a useful one being Symantec Enterprise Vault (EV). Enterprise Vault is a popular content archiving solution and like many content archiving solutions, EV began its life as an email management solution and has evolved into a more general purpose records management solution adding file system archiving, categorization, retention management, ILM, and preservation hold capabilities. These capabilities are important, but let’s look at how it stores files. It archives a variety of content including native email formats (MS Exchange and Lotus Domino) and file formats including MS Office, PDF, ODF, etc. The native formats of these files come in both open and closed formats. When EV archives these files, it creates a proprietary EV Digital Vault Saveset (DVS) file to encapusualte the native file, an HTML copy and some metadata. The problem with this is that once the file is in DVS format, the only way to read it is using Symantec tools and neither the specification nor the tools are freely available. An organization’s open format files (ODF, OOXML, PDF, etc.) suddenly cannot be opened or indexed by other solutions.

What can vendors do?

What can be done with this situation? Vendors can begin the process of moving to oen standard formats. To see how this can work, let’s use the DVS file as an example again. The DVS file is a proprietary container that includes the native format file, a HTML conversion copy and some metadata. The HTML is an open standard and the metadata can be written as XML. Then the DVS container can be changed to an open format such as ZIP. This would turn the file into an open standard much the same way that Microsoft’s OOXML and XPS are designed. To see this, simply change the extension of an OOXML file from say .docx to .zip and then open the file in your favorite unzip utility. The technology is available for vendors to move to open stadards. It is also available for organizations to add this to their requirements.

What can you do?

Here are some items organizations should consider when managing their data:

  1. organizations that are choosing a solution, ensure that it stores files in open standard formats. When running a proof of concept (POC), ask your engineers if they can copy the files on to a separate system and read them without proprietary vendor tools.
  2. if the current vendor does not support open standards formats, ask them when they will support it on their roadmap.
  3. If the vendor does not currently support open standard formats and you are uncomfortable, consider migrating to a solution that does support open standards. The amount of ESI inside organizations is growing everyday and the sooner a migration is made the less time and costs will be incurred.

If a decision is made to migrate away from closed format solutions, some options may be available, though at a cost. TransVault and Procedo are two providers that can assist. TransVault makes a product with the same name that can migrate data out of Symantec Enterprise Vault, Autonomy/ZANTAZ EAS and OpenText IXOS. TransVault has a growing Partner Network that can provide services including Instant InfoSystems in the US. Procedo is a similar solution called the Procedo Archive Migration Manager.


As organizations start to manage their files and email in content archiving and management solutions, it is becoming increasingly important to keep maintain the data’s readability. The need for open standard document formats has been established and it is time for the same philosophies to extend to the archiving solutions designed to preserve them. Organizations can maintain control of their content by deploying an open standards solution, encouraging vendors to support opens standards, or migrating to solutions that supports open standards. In the long run, the open standard approach is the most logical choice.

Photo courtesy of Elliott Brown.

