Thursday, June 18, 2009

Aggregate / Summary

One of my wife's pet peeves is when technologists take a word that already has clear meaning in the English language and then twist the definition to mean something not-quite-close-enough in a technical context. Great case-in-point in the world of data warehousing:

What is the difference between an aggregate and a summary?

I did a Google search and came up with a lot of junk floating around what I would consider some not very good answers (albeit from 2002) from some industry analysts. In fact, David Marco goes so far as to say "Summarized and aggregation are the same thing.


In my gut, there's always been a difference between an aggregate and a summary. So, I decided to try to articulate what the difference is. In doing so, my wife's advice of "just look in the dictionary" came in very handy.


Aggregate:
Comes from the Latin word for "to flock together" or "to flock or group."
Just to point out that there's nothing necessarily in that definition about the idea of reducing the specificity of something or the fact that the flock or group is made up of individuals. Rather, the idea is that there is a group of individual things acting together.

Summary:
From the Latin summa, one meaning of which is "total" or "sum," also the "principle or main thing."
Clearly, the idea with this root word is that a level of detail is being removed when the summary of something is presented. Rather than still being individual things, the summary of things is another layer of abstraction that represents the underlying detail.


So, my conclusion is that, in data warehousing application, the term aggregate can be used to represent an object that brings together various other ideas and combines them together in one place, regardless of any change or lack of change in the level of granularity. A summary, on the other hand, has to imply either a change in granularity either through mathematical means or by eliminating the amount of precision in a series of events.


Example 1:
If I have two fact tables, one representing purchase orders and the other invoices, I can create an aggregate that still contains all the same detail, but pre-joins those fact tables together in a new kind of fact. In order for that to be a summary, I also have to roll it up to something higher than the transaction grain.

Example 2:
If I have a workflow event table that I use to track how long it takes an order to go from ordered to fulfilling to packaging to shipping to billing, I can create a summary that only has ordered and shipping status records. In this case, the level of granularity hasn't really changed, but the level of detail has, so it is a summary.


I'm open to comments on this, but it seems very straight forward given the dictionary definition of the terms. I know people may use them differently, but we're a group of data-driven experts. Shouldn't clear and precision definition of terms be something that we strive for?

Monday, June 1, 2009

BI Maturity

This isn't a comment about the maturity of the BI industry, data warehousing methodology, or information visualization tools. This is a thought about how "business intelligence" goes through local cycles within a particular organization. Obviously, there are a lot of different paths that "business intelligence" can follow within any given organization, and that path is greatly influenced by industry and geographic business drivers, technologies, personalities, politics, lack or presence of champions, corporate culture, philosophy, and a myriad of other things.

Sometimes, "BI" takes a turn for the worse as it matures in an organization. It seems like there is a fine, but dramatic, line between "huge success because it is a centralized group of people" to "huge failure because it exists in isolation". The question is, can an organization going through this sort of transition leverage the opportunity to reach a both-and resolution:

  • Yes, the BI organization was headed down a path to failure and irrelevance;
  • And, yes, BI is an even more critical idea than ever in our organization.

Here's how we go forward, successfully leveraging BI expertise in the organization while also distributing an understanding and philosophy about business intelligence that is relevant across the organiation ---- after all, shouldn't every part of the business behave as intelligently as possible?

...Walking on Coals...

I've decided to continue an old B-Eye Blog called Sharpening Stones that I'd started up a few years ago, but to migrate the blog here. No offence intended to the great folks at the B Eye Network.

The URL for this new generation of the blog references the inspiration for it, the lyrics from REM's Exhuming McCarthy. You can reread my explanation back at the old blog in posts 1 and 2.