Ben Langhinrichs

Photograph of Ben Langhinrichs
E-mail address - Ben Langhinrichs







Recent posts

Tue 7 Nov 2017

CoexLinks Migrate 4.10 released



Thu 2 Nov 2017

Time to Migrate



Thu 26 Oct 2017

Modernizing your Data


December, 2017
SMTWTFS
     01 02
03 04 05 06 07 08 09
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31

Search the weblog





























Genii Weblog

Numbers and success and why it matters

Wed 3 May 2017, 11:54 AM



by Ben Langhinrichs
I posted this over on Facebook, but though I'd share it here as well. We've been doing some performance tuning on CoexLinks Migrate, which let's you export your email, both MIME and rich text, to high fidelity standardized formats including MBOX and EML and Exchange Mail Journal Envelope format (basically a wrapped up EML file with all the recipients and such in an envelope.
 
Performance tuning is fun, but it isn't sexy. You take Software A which creates End Product B, work incredibly hard to make sure it still creates the exact same damn End Product B, but faster. Same input, same output, less time. Not sexy.
 
But it can be satisfying. Here are the results from my test this morning (run on an old PC, so your mileage may vary, but it is likely to be better).
 
Inline JPEG image
 
Performance tuning seems to be helping. This is exporting to MBOX format, which is much faster than individual EML files, but 6000/minute means averaging 1/100 of a second per document. Wow. This generated an MBOX file with size 0.93GB from a mail db of 1.62GB, for what it is worth. When I ran the same test on the same database generating EML files,  it did about 2600/minute and generated 1.8GB in total EML files. That's overhead for you. 
 
In case you wonder the practical needs for this kind of speed, we have a client with 5TB of archived email. Using a very rough approximation based on my own mailbox, that would take about 96 hours (approximately 4 days) generated to MBOX. It would take roughly 220 hours for EML format (approximately. 9 days). 
 
When we started tuning, it averaged 950/minute for EML (we didn't measure for MBOX then). At that rate, it would have have taken 602 hours (approximately 25 days). Now, because these are separate databases, you would either run it on multiple processors or multiple machines, but even dividing these numbers by 10 machines/processors would take 9.6 hours, 22 hours and 60 hours respectively. The longer it takes, the more chance of something going wrong and having to start that part over. In short, speed matters. Fast enough, and you can even re-do the whole thing if some assumption turns out to be wrong. Slow enough, and a mistake can make you miss deadlines.
 
But as thay say on Reading Rainbow, you don't have to take my word for it. Request an evaluation license today and give it a spin.

Copyright © 2017 Genii Software Ltd.

What has been said:

No documents found

Have your say:

Name *:
E-mail:
e-mail addresses will not be displayed on this site
Comment *:


<HTML is not allowed>
Linking: Add links as {{http://xxx|title}}, and they will be activated once approved
Blocked? Unable to post a comment? Please read this for a possible explanation...