Ben Langhinrichs

Photograph of Ben Langhinrichs

E-mail address - Ben Langhinrichs







Recent posts

Wed 5 May 2021

Pull public data into Notes on the fly



Thu 29 Apr 2021

Archive a Notes DB off-line w/ Field data and active content



Tue 20 Apr 2021

Archive a Notes DB off-line in 4 easy steps


June, 2021
SMTWTFS
  01 02 03 04 05
06 07 08 09 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30

Search the weblog





























Genii Weblog

How "lossy" is your data conversion?

Wed 28 Sep 2005, 02:49 PM



by Ben Langhinrichs
The term lossy is often used in image compression to describe a format that retains most of the original, but will lose something over the original.  For example, JPEG is a lossy image compression format, which saves space but risks some dimunition of fidelity.  The term lossy is also used for various other data compression format, including ZIP and RAR.

But what about lossy data conversion?  In particular, "rich text" formats such as MS RTF, HTML, XHTML, MIME, Word and Notes rich text all may represent formatted data in ways that look alike but are different under the covers.  What is worse is that these formats almost all have capabilities, both visual and action oriented, that do not match up.  Specifically, I spend a lot of time on Notes rich text to HTML/XHTML and HTML/XHTML to Notes rich text, especially with CoexEdit.The obvious goal is to minimize the loss in any conversion, how can you measure the loss?

The following three graphics might help explain what I mean.  The first is a Word document, which I then copied and pasted into the browser window for FCKEditor using CoexEdit.  I used a Firefox browser, but the result is similar with Internet Explorer.  After pasting into FCKEditor, I saved, which let CoexEdit do its bit of magic, then switched to my Notes client, where the final graphic is shown.

So, is this "good enough"?  I'm not sure.  There are a few slight differences, such as round bullets in MS Word becoming triangular bullets in FCKEditor and then back to round bullets in Notes, but that is probably just the presentation of the default bullet type.  There also seems to be a spacing issue after the Version 2.0, where an extra space has been added.  Are those good enough?  Only time (and our customers, who are always right) will tell.

MS Word (This is the original Word document, which I then copied)

Inline JPEG image

FCKEditor using CoexEdit called from Firefox (I pasted the Word content in here)

Inline JPEG image

Notes view of same document (CoexEdit handled auto-conversion from FCKEditor)

Inline JPEG image

Copyright 2005 Genii Software Ltd.

What has been said:


372.1. Nathan T. Freeman
(09/29/2005 01:16 AM)

Who used "lossy" when discussing ZIP or RAR files? They are lossless. MP3 is a lossy format that pretty much everyone knows now, though.


372.2. Ben Langhinrichs
(09/29/2005 10:25 PM)

Good point.


372.3. Stan Rogers
(10/03/2005 12:51 PM)

C'mon Nathan -- compressed files will fit on smaller disks and flash drives, which are quite a bit easier to lose than their larger brethren. Thus "lossy" compression. Don't you keep up with the literature?