Ben Langhinrichs

Photograph of Ben Langhinrichs

E-mail address - Ben Langhinrichs

Recent posts

Thu 18 Jul 2019

HCL - Summary in three words

Wed 3 Jul 2019

HCL is doing so many things right, but...

Thu 27 Jun 2019

HCL: Not just features, but follow-through

July, 2019
 01 02 03 04 05 06
07 08 09 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31

Search the weblog

Genii Weblog

Asking questions to better understand your data

Fri 21 Jul 2017, 03:56 PM

by Ben Langhinrichs
Most IBM Notes/Domino customers who have used the product for a number of years have vast stores of data, but when they want to try to glean new insights, they are stymied by how to handle the data mining. Simple fields which map well to views are easy to extract, and are often relatively "clean", meaning that the value is what the value says it should be. But real applications, especially those built for internal use, often reflect a far more complex set of relationships. They may use parent-child hierarchies, doclinks, lookups to other databases. They may also contain information stored in multi-value fields or rich text fields that require manipulation and cleanup. 
While there are a number of techniques available from DXL to data scraping, it can quickly become programming intensive to find information and put it together. With this in mind, we have built a fairly easy database using the Midas LSX engine to extract, correlate and prepare data from different sources and build a result which does not always have a one-to-one correspondence with Notes documents. The main virtue of this approach is the ease with which you can ask questions and put together sources. If you decide you have something wrong or need something else, it takes just a minute to remove or add it.
I wanted to show how this works with an existing application used over a period of years by fairly sophisticated Notes users. I chose as a source the IBM Business Partner forums, because they are  widely available and familiar. One of the different uses for these forums over several years was to allow partners to file Possible Bug reports, which IBMers could monitor and use to create SPRs and so forth. In this brief video, I pose five questions of this fairly simple application. Imagine how you could use a similar application to delve into your company's data. 
Note that I don't talk much in the video about data cleaning, but if you look at the image below, note that the column F (first red arrow) is derived automatically by Midas as a boolean from column G (second red arrow). We have some data cleaning built in as options, but are also looking at ways to provide custom data cleaning and normalization for individual items. While it is inevitable that some data cleaning will be done after the data is loaded into data analysis or data visualization software, the cleaner it can be the better, as 80% of all time doing data analytics is preparing and cleaning and normalization the data. We are eager to discuss with customers how we can minimize that costly effort.

Copyright © 2017 Genii Software Ltd.

What has been said:

No documents found