Genii Weblog

How "lossy" is your data conversion?

Wed 28 Sep 2005, 02:49 PM

by Ben Langhinrichs
The term lossy is often used in image compression to describe a format that retains most of the original, but will lose something over the original.  For example, JPEG is a lossy image compression format, which saves space but risks some dimunition of fidelity.  The term lossy is also used for various other data compression format, including ZIP and RAR.

But what about lossy data conversion?  In particular, "rich text" formats such as MS RTF, HTML, XHTML, MIME, Word and Notes rich text all may represent formatted data in ways that look alike but are different under the covers.  What is worse is that these formats almost all have capabilities, both visual and action oriented, that do not match up.  Specifically, I spend a lot of time on Notes rich text to HTML/XHTML and HTML/XHTML to Notes rich text, especially with CoexEdit.The obvious goal is to minimize the loss in any conversion, how can you measure the loss?

The following three graphics might help explain what I mean.  The first is a Word document, which I then copied and pasted into the browser window for FCKEditor using CoexEdit.  I used a Firefox browser, but the result is similar with Internet Explorer.  After pasting into FCKEditor, I saved, which let CoexEdit do its bit of magic, then switched to my Notes client, where the final graphic is shown.

So, is this "good enough"?  I'm not sure.  There are a few slight differences, such as round bullets in MS Word becoming triangular bullets in FCKEditor and then back to round bullets in Notes, but that is probably just the presentation of the default bullet type.  There also seems to be a spacing issue after the Version 2.0, where an extra space has been added.  Are those good enough?  Only time (and our customers, who are always right) will tell.

MS Word (This is the original Word document, which I then copied)

Inline JPEG image

FCKEditor using CoexEdit called from Firefox (I pasted the Word content in here)

Inline JPEG image

Notes view of same document (CoexEdit handled auto-conversion from FCKEditor)

Inline JPEG image

Copyright 2005 Genii Software Ltd.

What has been said:

372.1. Nathan T. Freeman
(09/29/2005 01:16 AM)

Who used "lossy" when discussing ZIP or RAR files? They are lossless. MP3 is a lossy format that pretty much everyone knows now, though.

372.2. Ben Langhinrichs
(09/29/2005 10:25 PM)

Good point.

372.3. Stan Rogers
(10/03/2005 12:51 PM)

C'mon Nathan -- compressed files will fit on smaller disks and flash drives, which are quite a bit easier to lose than their larger brethren. Thus "lossy" compression. Don't you keep up with the literature?