Genii Weblog

ODF and bloat - When in doubt, jar it

Wed 12 Jul 2006, 06:37 PM



by Ben Langhinrichs
This post was inspired by a comment/question from Paul Ryan (#477.7) regarding an earlier post.  Paul says:
That said, honestly, I didn't find this topic particularly interesting, except to note that ODF consists entirely of XML, something that I hadn't quite registered before. Therefore, and as your example clearly demonstrates, ODF files are going to be very fat compared with proprietary formats like Notes rich-text or Microsoft formats.

Digressive musing...what XML really needs is a widely-used, maybe even compulsory, compression component to combat the bloat problem. Maybe there is something like that out there, and I'm just not aware of it.As a lawyer might say, Paul, asked and answered.  Yes, XML is fairly heavy, and the way ODF is implemented is even heavier than it would need to be, although still not as heavy as Office Open XML (OOXML) seems to be.

So, the obvious answer is to compress the whole thing, which is just what both ODF and Office Open XML do, and even in almost exactly the same way.  When you see an ODF file such as ThisDoc.odt, you are really seeing a zipped repositiory with several files inside it.  Technically, it is even more specific than a zip file, it is a "JAR file", which is to say exactly the same format as a Java Archive package.  There are usually several files and subdirecties in such a package, although the only required files are the META-INF\manifest.xml and the content.xml file which descrbe the contents of the JAR file and the content of the document, respectively.

But is it any good knowing this?  Sure, it makes clear why ODF files are not as humungous as they might otherwise be, since the zip compression is fairly good at compressing, but what else is it good for?

Well, for one thing, it is good for extracting images.  Unlike a Word .doc file or Notes rich text field, if you want all the images included in an ODF file, you can simply rename the .odt to .zip and unzip the graphics files.  You can also alter the content.xml file by hand or with some other utility and re-zip it, so long as there is not encryption set up on the JAR file.  This is like fiddling with DXL, except it is more reasonably structured.

So, for what it is worth, there you have it.

Copyright © 2006 Genii Software Ltd.

What has been said:

No documents found