Sat 18 Jul 2020
Enduring favorite - Getting Data out of Notes (for whatever reason)
Thu 9 Jul 2020
Maximizing power while minimizing code and effort
Fri 29 May 2020
Round tripping, even while staying put
Numbers and success and why it matters
Wed 3 May 2017, 11:54 AMTweet
by Ben Langhinrichs
I posted this over on Facebook, but though I'd share it here as well. We've been doing some performance tuning on CoexLinks Migrate, which let's you export your email, both MIME and rich text, to high fidelity standardized formats including MBOX and EML and Exchange Mail Journal Envelope format (basically a wrapped up EML file with all the recipients and such in an envelope.
Performance tuning is fun, but it isn't sexy. You take Software A which creates End Product B, work incredibly hard to make sure it still creates the exact same damn End Product B, but faster. Same input, same output, less time. Not sexy.
But it can be satisfying. Here are the results from my test this morning (run on an old PC, so your mileage may vary, but it is likely to be better).
Performance tuning seems to be helping. This is exporting to MBOX format, which is much faster than individual EML files, but 6000/minute means averaging 1/100 of a second per document. Wow. This generated an MBOX file with size 0.93GB from a mail db of 1.62GB, for what it is worth. When I ran the same test on the same database generating EML files, it did about 2600/minute and generated 1.8GB in total EML files. That's overhead for you.
In case you wonder the practical needs for this kind of speed, we have a client with 5TB of archived email. Using a very rough approximation based on my own mailbox, that would take about 96 hours (approximately 4 days) generated to MBOX. It would take roughly 220 hours for EML format (approximately. 9 days).
When we started tuning, it averaged 950/minute for EML (we didn't measure for MBOX then). At that rate, it would have have taken 602 hours (approximately 25 days). Now, because these are separate databases, you would either run it on multiple processors or multiple machines, but even dividing these numbers by 10 machines/processors would take 9.6 hours, 22 hours and 60 hours respectively. The longer it takes, the more chance of something going wrong and having to start that part over. In short, speed matters. Fast enough, and you can even re-do the whole thing if some assumption turns out to be wrong. Slow enough, and a mistake can make you miss deadlines.
But as thay say on Reading Rainbow, you don't have to take my word for it. Request an evaluation license today and give it a spin.
Copyright © 2017 Genii Software Ltd.
What has been said: