Genii Weblog
OpenXML4J Scenarios
Fri 1 Jun 2007, 06:01 PM
Tweetby Ben Langhinrichs
Copyright © 2007 Genii Software Ltd.
What has been said:
596.1. Ian Randall (06/03/2007 08:49 PM)
I assume that this blog topic was triggered by the changes (that came into affect on December 1st 2006) to the Federal Rules of Civil Procedures (FRCP), which equire organizations to adapt how they manage, retain, store and deliver electronically stored information (ESI) during the eDiscovery phase of legal proceedings.
In particular ensuring that corporate policies and procedures for document and email retention, eDiscovery readiness, and metadata management comply with the new FRCP legislation.
This is likely to be a very major international issue, impacting Lotus Notes email, integration with MS Office files, ODF Editors etc.
If multi-national organisations get this wrong, then the FRCP changes can have huge consequences to their global infrastructure and may also stop organisations (even outside of the US) from carrying on normal IT operations such as deleting spam, setting storage limits on email etc.
Deleting metadata from MS Office Documents (or any other file format) is only the tip of the iceberg.
596.2. Ian Randall (06/07/2007 01:13 AM)
Modifying the file format of some large Word documents might not be such a bad idea.
I did a test today converting a 60MB Word document into ODF format using the IBM ODF editor. Ended up being 5.6MB, with reasonable fidelity after the conversion.
The only major issue that I had with formatting was the Table of Contents. Converting using OpenOffice 2.0 did a better job of converting the Table of Contents and also reduced the file size down to almost exactly the same file size (5.6MB).
Although I made the mistake of trying to convert the same open (unsaved document) to PDF using the IBM ODF editor and then tried to print it out with a PDF Print driver (rather than the build-in PDF converter), which exploded the file to over 2GB before I had to abort the process after 15 minutes. I guess that is what Beta code is meant to do.
I wonder of converting Word.doc format to OpenXML is going to produce similar results?