Ben Langhinrichs

Photograph of Ben Langhinrichs

E-mail address - Ben Langhinrichs







Recent posts

Wed 18 Sep 2019

Perils of PDF 5: Data Confusion



Mon 16 Sep 2019

About that email in Notes



Mon 9 Sep 2019

Perils of PDF 4: Missing and obscured data


November, 2019
SMTWTFS
     01 02
03 04 05 06 07 08 09
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search the weblog





























Genii Weblog

Midas vs. DXL: When performance matters

Thu 29 Sep 2005, 09:55 AM



by Ben Langhinrichs
The addition of DXL processing has been a great bonus for customers who need rich text manipulation, as there is now a native way to do many of the things which could previously only be done with the C API or with our Midas Rich Text LSX.  So, the question comes up sometimes, why bother with Midas for rich text manipulation? Sure, it does other things DXL can't touch, such as HTML generation, MIME e-mails, rich text comparison, contextual link matching and so on, but what about simple stuff like adding a table?  To address that, take a look at this question from the Notes 6 & 7 Gold forum: 
I am writing codes to generate a NotesDocument with a tailor-made table(diff cell color, merged cell) by DXL.

I found out using DXL to generating document with table is much slower than by using LS-RT table, especially when the table size is large.(9 sec vs 2 sec for 52row*16col table) However, DXL offers more flexible controls on complex table-making.

Is there any way to optimize the DXL codes?
Excuse me, but 9 seconds?  To generate a single document with a table?

I responded with this, being as polite as I could be:
If you really want performance and control over rich text at the same time, DXL isn't going to work well for you, and neither will the native LS rich text classes.  Both are just way too slow.  If you need performance, you either need an API solution, which will be pretty complex to write, or a third party product such as our Midas Rich Text LSX, which combines high levels of flexibility with extremely high performance.

As an example, I tried creating a table such as you describe, with different cell colors and merged cells and 52 roiws and 16 columns.  I tried once creating the whole table at one call (without the merged cells), and another where every row was created separately to allow merging and changing various cell attributes.  In both cases, creating a single table was too fast to measure, so I created 100 separate documents with a table such as this in each. 

Midas Rich Text LSX times

Table created in one callTable created a row at a time
100 documents4 seconds15 seconds


As you can see, well under a second is better than either the 9 seconds DXL takes or the 2 seconds the LS RT classes take.  Of course, you may not need that level of performance, but if you are creating 25,000 documents and doing other processing (as a recent customer described), you might want it to complete in less than the 62.5 hours that the DXL processing would require, or even the almost 14 hours the rich text classes would take, even if they could handle this.  The customer's job is slightly different, but does involve appending large tables a row at a time, and they report that the task runs in a bit less than 1 hour, even with all the additional processing.


So, while DXL is great, don't start using it unless you have lots of patience and like going on coffee breaks.  Nine seconds!?!?

Copyright © 2005 Genii Software Ltd.

What has been said:


366.1. Stan Rogers
(09/29/2005 07:57 PM)

Actually, it's about half a second plus the import time (derived by creating a deliberate illegal child element early in a test document to make validation fail on import, then removing the illegal element nd testing again). I've done some pretty complex rich text stuff in DXL, and I was really surprised by the times given in that posting -- I create a newsletter by getting document text from a CMS database, formatting using user-defined stylesheets in another location for consistency, import pictures, and place everything into a user-defined newsletter template, and all of that happens in well under two seconds. At first, I thought that there had to be something wrong with the poster's code, so I thought I'd do a quick test to confirm and then tell him so.

The NotesDXLImporter really, really doesn't like big tables -- the import time is just about proportional to the number of table cells, as far as I can tell. The current template for my newsletter is not exactly simple, but it's only a five-by-six table nested in a four-by-two table, so there aren't that many table cells involved altogether. I suspect that things are being checked, double-checked, QA'd, signed off, voted upon, and then checked again before the document(s) are actually created. Not that DXL is ever going to be as quick as Midas (or even nearly so), but big tables hitting the DXL importer seems to be the bottleneck.


366.2. Ben Langhinrichs
(09/29/2005 08:11 PM)

Well, to be fair, everything about big tables is slow in Notes. Just opening a document with a big table is slow in Notes, and I have spent enormous amounts of time tuning Midas to try to make table manipulation faster. Partly, I do caching of the CD record stream, because the subsequent rows are what kill the process. If I turn off caching in Midas, the number goes to 34 seconds for 100 documents, even with all the other tuned code.

I guess the point to remember too is that there are basically two ways people use tables. They are either working on a single document at a time, in which case anything under a second is probably "fast enough", or they are working on lots of documents, in which case the faster the better. Even at 2 seconds, the amount of time to do a whole load of processing (like the example I mentioned in my post) just shoots up. On modern computers, we are used to processing tens to hundreds of documents a second. For example, on our Midas LSX page, I cite the example of HTML generation by Midas at 900 documents a second. That is really, really fast, but send it off to do a million documents and it still takes almost twenty minutes. Imagine if you had times even at half a second per document. A million document process would take close to a week! Even 100,000 documents would take all day. It is the big batch jobs where performance is really key. For single documents where you don't have really big tables, DXL is plenty fast enough.


366.3. Ben Langhinrichs
(09/29/2005 11:22 PM)

Just a try