Ben Langhinrichs

Photograph of Ben Langhinrichs
E-mail address - Ben Langhinrichs






September, 2017
SMTWTFS
     01 02
03 04 05 06 07 08 09
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search the weblog





























Genii Weblog


Civility in critiquing the ideas of others is no vice. Rudeness in defending your own ideas is no virtue.


Sat 16 Sep 2017, 02:24 PM
I wanted to demonstrate what our new Defect Detection (and correction) looks like, but the various examples I have are all covered under NDA. So, in this series, I will break some of my own images in the way the customer images are broken. (These are all scaled down from the original width to make them fit on the blog better.)
 
Original image (scaled down)
Inline JPEG image
 
 
1) Image header lists n segments, but there are only m (where m < n)
 
For my first example (going in the order of this blog post), I take the same image saved as GIF and JPEG (using GIMP) and imported into Notes rich text, then broken using the Midas LSX. In the GIF format there are two "image segments" in rich text, while In the JPEG format  there are six "image segments". This simply means that the raw image data is spread out over a few different CD records. When an image "breaks", it is often because some of the trailing image segments are lost, or rather overwritten. Below, I show the same image, first broken by removing the last few image segments (half of them), then fixed by the Defect Detection system in CoexLinks and AppsFidelity.
 
As you can see, many "fixed" images aren't perfect, but they are often legible enough to read. If the information on that image is critical, wouldn't you rather see of the those rather than the white box below?
 
 
Broken images (all look like this in Notes)
 
Inline JPEG image
 
 
Fixed JPEG image (minus half the image segments)
 
Inline JPEG image
 
 
Fixed non-interlaced GIF (minus half the image segments)
 
Inline JPEG image
 
Fixed interlaced GIF (minus half the image segments)
 
Inline JPEG image

Copyright © 2017 Genii Software Ltd.

Technorati tags:

Tue 22 Aug 2017, 11:41 AM
I got a few questions from people at MWLUG about what CoexLinks Fidelity is, and what exactly 'email fidelity' might look like. Rather than just talk, you can try for yourself. Note: our newer CoexLinks Migrate and CoexLinks Journal products use the same rendering engine internally. If you submit your email address below, you will get twelve messages to the email address you used. Six will be sent using the normal Domino 9.0.1 email engine, then the same six will be sent rendered by CoexLinks Fidelity. Compare and decide whether your company and your clients would be well served with email fidelity, or whether your migration would be safer with it. If you want to try CoexLinks Fidelity for yourself, just fill out an evaluation request, and we'll get you set up. To find out about the other two products, click on the links for each product above.
 
No, we won't spam you endlessly if you give us your email. It is only used for this demo and a quick follow up afterwards to make sure your questions have been answered.
 


Copyright © 2017 Genii Software Ltd.

Technorati tags:

Wed 16 Aug 2017, 05:01 PM
As part of our Defect Detection feature, we encounter and usually fix a number of different defects in images. The following list is the different scenarios we detect which cause any issue with extracting or rendering the images. Of these 31, at least 27 have been encountered in actual customer documents. The other four are left in because they might be some day. While none of these problems are common, some are likely to be encountered in any large data repository, especially in mail which has been converted back and forth between MIME and rich text (as with replies or forwards).
 
A few of these problems are completely unfixable, but only a few. For example, with scenarios 1a and 17, we have no image information to work with at all.
 
Many of these are easily fixed. For example, 25 or 26 are easily deduced from the data.
 
The rest can be partially fixed. Part of the image may have low resolution, or a section may be missing, but usually the image is at least intact enough to identify.
 
1) Image header lists n segments, but there are only m (where m < n)
1a)Image header lists n segments, but there are none
2) Image header lists n segments, but there are m (where m > n)
3) Image segments contain multiple GIF starts (often associated with #2)
4) Image segments occur with no image header
5) Multiple image headers
6) Image header lists data size other than actual (except where PNG)
7) Win meta header lists n segments, but there are only m (where m < n)
8) Win meta lists n segments, but there are m (where m > n)
9) Win meta segments occur with no Win meta header
10) Multiple win meta headers
11) Win meta header lists data size other than actual
12) Bitmap header lists n segments, but there are only m (where m < n)
13) Bitmap lists n segments, but there are m (where m > n)
14) Bitmap segments occur with no Bitmap header
15) Multiple bitmap headers
16) Bitmap header lists data size other than actual
17) No headers or segments
18) Both image segment or win meta and bitmap
19) Image resource that is missing
20) Pseudo-attached image that is missing.
21) Image segments occur before image header
22) Win meta segments occur before win meta header
23) Bitmap segments occur before bitmap header
24) No graphic record but other graphic elements
25) Graphic record says resized, but has 0 width or height
26) PNG record missing
27) PNG header lists n segments, but there are only m (where m < n)
28) PNG header lists n segments, but there are m (where m > n)
29) Multiple PNG headers
30) Image segments contain multiple PNG starts (often associated with #28)
31) PNG header lists data size other than actual
 
Phew! As you can see, there are a lot of potential issues, but with Defect Detection, we can fix the majority. That's why we do it.
 

Copyright © 2017 Genii Software Ltd.

Tue 15 Aug 2017, 12:53 PM
CoexLinks family of products: CoexLinks Fidelity, CoexLinks Migrate and CoexLinks Journal
 
Very soon, we are releasing a new version of all three of our CoexLinks products, CoexLinks FidelityCoexLinks Migrate and CoexLinks Journal. Aside from other features and bug fixes, they will share a new feature called Defect Detection. While the challenge for most document rendering (to MIME in this case) is faithfully reproducing the content of the email and including the envelope information in the desired form, some Notes emails have corruptions and defects which make the job harder.
 
There are four major defects (and a few smaller ones not worth mentioning):

  • Broken inline images. A variety of corruptions in images including zero-length data, missing image segments and incorrect image type (e.g., a GIF is marked as a JPEG) leave images broken in both the Notes client and the rendered document. We are able to detect and repair or partially repair about 75% of these corrupted images.
  • Compressed attachments with incorrect sizes. These are difficcult to detect because you can open the attachment or save it to disk from the Notes client, so you don't know you have an issue. But since the uncompressed size is incorrect, the document will be truncated and corrupted when emailed or when it is rendered by most tools including the Domino rendering engine. We can fix 100% of these corruptions.
  • Hotspots with invalid ends. In some versions of Notes, URL hotspots and other hotspots inside sections or table cells were left without a closing record. While they appear fine in Notes, they render with either large parts of the Body content missing, or with everything to the end showing as a URL link. We can fix about 95% of these corruptions.
  • Invalid stored image URLs. These corruptions are an artifact of the external MIME to internal MIME rendering, so mostly appear with received MIME emails or forwarded/replied to MIME emails. The fix is fairly simply, so we can fix 100% of these corruptions.
 
Whether you are sending email to customers, reading your own mail from a mobile or web interface, migrating an entire database or journaling mail to a third party vault, it is better to have defect detection in place so that unusual does not become the irretrievable.
 

Copyright © 2017 Genii Software Ltd.

Technorati tags:

Fri 21 Jul 2017, 03:56 PM
Most IBM Notes/Domino customers who have used the product for a number of years have vast stores of data, but when they want to try to glean new insights, they are stymied by how to handle the data mining. Simple fields which map well to views are easy to extract, and are often relatively "clean", meaning that the value is what the value says it should be. But real applications, especially those built for internal use, often reflect a far more complex set of relationships. They may use parent-child hierarchies, doclinks, lookups to other databases. They may also contain information stored in multi-value fields or rich text fields that require manipulation and cleanup. 
 
While there are a number of techniques available from DXL to data scraping, it can quickly become programming intensive to find information and put it together. With this in mind, we have built a fairly easy database using the Midas LSX engine to extract, correlate and prepare data from different sources and build a result which does not always have a one-to-one correspondence with Notes documents. The main virtue of this approach is the ease with which you can ask questions and put together sources. If you decide you have something wrong or need something else, it takes just a minute to remove or add it.
 
I wanted to show how this works with an existing application used over a period of years by fairly sophisticated Notes users. I chose as a source the IBM Business Partner forums, because they are  widely available and familiar. One of the different uses for these forums over several years was to allow partners to file Possible Bug reports, which IBMers could monitor and use to create SPRs and so forth. In this brief video, I pose five questions of this fairly simple application. Imagine how you could use a similar application to delve into your company's data. 
 
 
Note that I don't talk much in the video about data cleaning, but if you look at the image below, note that the column F (first red arrow) is derived automatically by Midas as a boolean from column G (second red arrow). We have some data cleaning built in as options, but are also looking at ways to provide custom data cleaning and normalization for individual items. While it is inevitable that some data cleaning will be done after the data is loaded into data analysis or data visualization software, the cleaner it can be the better, as 80% of all time doing data analytics is preparing and cleaning and normalization the data. We are eager to discuss with customers how we can minimize that costly effort.
 
 

Copyright © 2017 Genii Software Ltd.

Technorati tags:

Tue 20 Jun 2017, 02:19 PM
As software vendors or application developers or anyone else who documents software or processes, we often face the need to come up with an example. The goal of almost any example or documentation is to be simple enough for the uninitiated to grasp while being complex enough to show the possibilities. This is often accomplished with more than one example, so that we can show both how easy it is with one example and how powerful and flexible it is with another.
 
But there is an interesting question of responsibility raised by examples. Are we responsible for those people who just grab the example and go with it, even if they should be modifying it? A classic, and rather extreme, case might be when your example includes "YourServer" or "YourDB.nsf" or even "Firstname Lastname". While it might lead to an embarrassing support call, the implications of someone actually using such an example verbatim are slight. Most likely, the process or software won't work until they plug in an appropriate value.
 
There is one class of example which is different. This is the case of somebody using an example with a password or encryption key that is intentionally weak. I read today that 15% of IoT users leave the default password, and we have all known users who use 12345 as a password or key. While it is clearly the responsibility of the user to be more secure, do we have a responsibility to encourage security? It is not a simple question, as even if we do, and use a complex password or key, that password or key is usually static in the documentation, and so inherently insecure.
 
The following comes from the OpenSSL wiki. It comes with a clear warning not to use that key, which is good, but it intentionally uses one of very few weakest DES keys, which seems an odd choice. Since the user is not meant to type the example exactly, why not use a more random secure key? But if they did, would that be false security since it was static? In a perfect world, the key used in the example might be random and generated on the fly so that every viewer saw a different key. Then, if the example were copied and pasted, a "good" key would be used. But is that really the responsibility of the documentation writer? I don't know.
 
Inline JPEG image
 

Copyright © 2017 Genii Software Ltd.