Ben Langhinrichs

Photograph of Ben Langhinrichs
E-mail address - Ben Langhinrichs






August, 2017
SMTWTFS
  01 02 03 04 05
06 07 08 09 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31

Search the weblog





























Genii Weblog

Numbers and success and why it matters

Wed 3 May 2017, 11:54 AM



by Ben Langhinrichs
I posted this over on Facebook, but though I'd share it here as well. We've been doing some performance tuning on CoexLinks Migrate, which let's you export your email, both MIME and rich text, to high fidelity standardized formats including MBOX and EML and Exchange Mail Journal Envelope format (basically a wrapped up EML file with all the recipients and such in an envelope.
 
Performance tuning is fun, but it isn't sexy. You take Software A which creates End Product B, work incredibly hard to make sure it still creates the exact same damn End Product B, but faster. Same input, same output, less time. Not sexy.
 
But it can be satisfying. Here are the results from my test this morning (run on an old PC, so your mileage may vary, but it is likely to be better).
 
Inline JPEG image
 
Performance tuning seems to be helping. This is exporting to MBOX format, which is much faster than individual EML files, but 6000/minute means averaging 1/100 of a second per document. Wow. This generated an MBOX file with size 0.93GB from a mail db of 1.62GB, for what it is worth. When I ran the same test on the same database generating EML files,  it did about 2600/minute and generated 1.8GB in total EML files. That's overhead for you. 
 
In case you wonder the practical needs for this kind of speed, we have a client with 5TB of archived email. Using a very rough approximation based on my own mailbox, that would take about 96 hours (approximately 4 days) generated to MBOX. It would take roughly 220 hours for EML format (approximately. 9 days). 
 
When we started tuning, it averaged 950/minute for EML (we didn't measure for MBOX then). At that rate, it would have have taken 602 hours (approximately 25 days). Now, because these are separate databases, you would either run it on multiple processors or multiple machines, but even dividing these numbers by 10 machines/processors would take 9.6 hours, 22 hours and 60 hours respectively. The longer it takes, the more chance of something going wrong and having to start that part over. In short, speed matters. Fast enough, and you can even re-do the whole thing if some assumption turns out to be wrong. Slow enough, and a mistake can make you miss deadlines.
 
But as thay say on Reading Rainbow, you don't have to take my word for it. Request an evaluation license today and give it a spin.

Copyright © 2017 Genii Software Ltd.

What has been said:

No documents found

Have your say:

Name *:
E-mail:
e-mail addresses will not be displayed on this site
Comment *:


<HTML is not allowed>
Linking: Add links as {{http://xxx|title}}, and they will be activated once approved
Blocked? Unable to post a comment? Please read this for a possible explanation...