Genii Weblog

Totally tabular (the 'BP' version)

Sun 21 Nov 2004, 08:49 PM

by Ben Langhinrichs
This is the third version of my Totally Tabular post.  This combines parts of the other two but accentuates non-@Midas ideas for totally tabular thinking.  To see more about what @Midas can do, see either the Totally Tabular (the 'Good Parts' version) or Totally Tubular (the 'Technology Wonk' version).

Rich text data is still data
I have been doing a lot of thinking about the future of ad hoc data, particularly in light of the upcoming integration of DB2 with the Notes/Domino data store.  As many of you may know, I have long believed that rich text, and other ad hoc rich data formats such as HTML and Microsoft RTF, have been needlessly and foolishly relegated to the status of blobs, and not treated as true data.  This is likely to continue with the DB2 integration.  I have some thoughts on the matter, and I thought I'd share them with those who are interested.

All data is tabular
Given the assumption that rich text data is still data, there is a further assumption I would like to make, which is that all data is tabular.  (My son objects to the term, as he thinks it should refer to tabs and not tables, but I mean tables) Even a singular item could be treated as a table of information containing just one cell, and any  data that can be iterated through is essentially tabular data with multiple rows in a single column.  Whether this is a list of bullet points, such as:

  1. My first point
  2. My second point
  3. My third point

or a series of URL hotspots, such as:

The websites of IBMMicrosoft and Apple all contain lots of information.

or even a series of images,such as:

Inline GIF imageInline GIF imageInline GIF imageInline GIF image

these are all tabular information, that could be represented as:

My first point
My second point
My third point



Inline GIF image
Inline GIF image
Inline GIF image
Inline GIF image

All tabular data is multidimensional with virtual columns
Now, to stretch a concept even further, I would like to argue that all tabular data is multidimensional.  Even the single item, be it a bullet point or a URL hotspot or an image or whatever, has various attributes, and these attributes make up what I would like to refer to as "virtual columns".  To look at just one example, the URL hotspots could be seen as:


with these virtual columns and many more representing different properties of the elements, in no particular order.  The images might have different virtual columns than the text, but they all have properties or attributes, each of which could be thought of as a column.  Of course, you can't really talk about "column 1" or "column 3", so instead, you would need to think of these as "named columns", such as "column URL" or "column Target".

Accessing tabular data programmatically now
So, if you accept my premise that all data is tabular, and you additionally accept that all tabular data is virtually multidimensional, how can we use that fact?  How can we treat these multidimensional virtual tables that may span a great deal of extraneous contents, such as the additional content between the URL hotspots?  There are probably lots of ways, some of which already peek into programming today.  For example, in JavaScript we can access the document.links array, which contains each URL link on a page (I'm not sure whether this works from the Notes client or not).  With the Notes rich text classes, we can iterate through some of these items as well, although the number of different elements is limited, as is the properties we can retrieve for the elements.  Again, in both JavaScript and LotusScript, the access is iterative and a bit cumbersome, but does allow access to the elements in a tabular way that skips the intervening stuff between the elements.

Accessing tabular data in the future
I would suggest that Notes developers would gain a lot by being able to access all of this rich data with the formula language, using some of the same powerful methods they use now to access the multidimensional tabular data in views, in other words with @DbColumn and @DbLookup.  Non-Notes developers might well want to use SQL statements to accomplish the same thing, but I'll leave that for another post.

Accessing non-table tabular data programmatically
Attempting to "eat my own dogfood", I have done my bit in this regard by adapting our @Midas Formulas to allow @DbColumn and @DbLookup to access these sorts of tabular data.  As you may know, @Midas will let you use @DbColumn to read a column of data from a rich text table.  But if all data is tabular, why should we be constrained to rich text tables?  An example was raised in a wish list item in the Business Partner forum.  Jorick Astrego wrote
When I receive an email with a hyperlink in it that I want to send to someone else or paste somewhere else; I have to open the source and then copy paste it. And to make things worse, the link is extra difficult to copy because of the automatic selection. I just hate it when software automatically selects the whole string when I only need a part of it. I can think for myself thank you.
Now, you might be able to write a script that would handle this by iterating through the rich text and finding the URL hotspots and URLs, and ask the user which one he or she wanted and give back the URL for it.  I don't know that the native Notes LotusScript classes give you enough information, but our Midas Rich Text LSX could do it.  But while you could iterate through the URL hotspots, a shorter and simpler approach would be to use @DbColumn.  This is how I would do this using @Midas.

hotspots := @DbColumn("Midas":"NoCache"; ""; @DocumentUniqueID; "Body":"URLLink *");

Now, to access the "named columns" I mentioned before, you simply specify the column where an @DbColumn normally has it, except with a string instead of a number, so for example:

urls := @DbColumn("Midas":"NoCache"; ""; @DocumentUniqueID; "Body":"URLLink *"; "LinkURL");

which would return the URLs associated in the virtual "named column" called LinkURL.  Finally, assuming your URL hotspots were all uniquely named, you could use @DbLookup to get the specific URL based on the text of the URL hotspot, using:

hotspots := @DbColumn("Midas":"NoCache"; ""; @DocumentUniqueID; "Body":"URLLink *");
textval  := @Prompt([OkCancelList]:[NoSort]; "Get URL"; "Select URL hotspot whose URL you want:"; ""; hotspots);
url := @DbLookup("Midas":"NoCache"; ""; @DocumentUniqueID; "Body":"URLLink *"; textval; "LinkURL");

I go into all this in a good deal more detail in my posts mentioned in the beginning, but I want to extrapolate beyond the @Midas use into other rich text stores and how this same logic could be used.

Accessing non-table tabular data from other rich text stores programmatically
Once we have seen this technique, and how easy and powerful it is, it raises the question of whether we could access other rich text stores such as HTML or XML or Microsoft RTF or MIME.  

For HTML, why not have such functions as:

hotspots := @DbColumn("HTML":"NoCache"; ""; "URLLink *");
textval  := @Prompt([OkCancelList]:[NoSort]; "Get URL"; "Select URL hotspot whose URL you want:"; ""; hotspots);
url := @DbLookup("HTML":"NoCache"; ""; "URLLink *"; textval; "LinkURL");

This theoretical formula would retrieve all the links from the page and let you pick the hotspot and then return the URL value behind it.

For XML, why not process an RSS file with functions such as:

titles := @DbColumn("XML":"NoCache"; ""; "item *"; "title");

Thinking in Googlish terms, for a directory structure, why not search a directory with functions such as:

filesize := @Sum(@DbColumn("DIR":"NoCache"; "h:\public\bin"; "File Ends '.txt' *"; "FileSize"; "Format=Number"));

When confronted by MIME, why not get a list of the image CIDs contained with a call such as:

cids := @DbColumn("MIME":"NoCache"; ""; @DocumentUniqueID; "Body":"image/  *"; "Content-ID");

So, does this sound interesting to anyone other than me?  Should I explore making similar access methods for other rich storage formats.

Copyright 2004 Genii Software Ltd.

What has been said:

No documents found