Monday, February 8, 2010

Living on the Edge

I've been working for a while on explaining the value and importance of Enterprise Information Management.  While I usually get some good ideas from Wikipedia, the article there today has let me down.  As referenced in the Wikipedia article, Gartner and Forrester have some valuable things to say on the topic.  But I believe that it can really be boiled down to a simple and practical explanation, given one assumption: in a business organization with optimal management of information, no individual business unit will appear to be fully optimized even though the overall organization is optimized.  Here's why:

Imagine the information flow through the business units of an organization as a connected network of nodes.  Each edge in the graph represents some type of process (automated or manual) that moves information from one department to another.  The Information Supply Chain model describes how each of those edges has a cost associated with it.  Each edge, to be worth while, must also create some added value (either through reduced effort via automation or added meaning).  The cost / benefit sides of those edges, though, don't typically come from / contribute to the same departmental bottom line.  Typically it is a matter of the originating business unit paying the cost of additional data collection or manipulation so that a receiving business unit can benefit.

In the VERY simple diagram above, the argument is obvious.  The Admitting department in a hospital has the purpose of collecting information from a patient that other departments will need in order to do their job effectively.  Admitting collects patient information (e.g. contact information, primary care physician, insurance information) once so that other departments can all benefit from it.  Surgery uses the information collected during Admitting to retrieve the patient's medical record and orders.  Billing uses the same patient information plus the additional information about what procedures were performed by Surgery to create invoices to payers (who will use similar information to try to avoid paying the bills).

It would be inefficient if the Surgery department had to collect from you the information it needs to find your medical record and orders; then have your surgical procedure followed immediately by a visit from the billing department to collect the same information about you so that they could proceed with coding and billing processes.  This clearly does happen sometimes.  Occasionally with good cause, but often because of redundancies between systems, and sometimes because the process doesn't think to collect a piece of information up front.  For instance, Admitting may not have any reason to ask "do you have any allergies" because that isn't necessary to complete their assignment of "log that the patient arrived and notify surgery."  So, Surgery has to ask the additional questions that are important to it "have you eaten," "do you have any allergies," etc.  With some of those questions, significant time and safety risks could be avoided if they are asked as early in the encounter as possible.

So, it seems to me that Enterprise Information Management is most importantly about the management of the edges of that graph and has a direct impact on the efficiency of the Information Supply Chain.  Those edges between the nodes of the diagram aren't merely straight lines that magically move information from one business process to another.  They are interfaces and systems and business processes that cost significant time, money, and risk to quality.

Why can't we expect each business unit to simply do what will result in an optimal collection and movement of information?  Not because they're maliciously selfish about their time or resources, but because individual business units don't usually have the perspective to fully understand the down stream value of their own operations.  Enterprise Information Management is the work that lives in between business units and drives overall optimization of the edges between them.  Enterprise Information Management is something that lives between and outside of individual business units.  Business units can be counted on to optimize their own internal operations.  Enterprise Information Management has to be planned and managed explicitly, above and beyond departmental objectives.

Thursday, February 4, 2010

How Not to Clean Data

PREFACE: Even if you don't have some familiarity with the inside of a hard drive, you'll probably still be troubled by this story from my past.  To answer the obvious questions that you'll have after the story: Yes, I really do have a legitimate degree in Electrical Engineering.  No, don't worry, I've never been employed in a way that uses that degree in a significant way to design or build any products that you might own.

I once had an old hard drive that started making a bit of a racket.  It didn't stop working right away, but I was concerned that there was something wrong with it.  I thought that maybe I could figure out what was wrong if I opened it up and poked around inside.  At that point I'd never seen the inside of a hard drive in real life.  So, I invested in my first set of star-point screw drivers and carefully disassembled the case of the hard drive.  Even after the screws were out, the metal cover stuck a bit.  It seemed like there was some kind of seal that had it closed, so I used a flat head screw driver to pry it open.

Wow.  Shiny.  Really clean.

I plugged the hard drive back in, while both the computer case and the hard drive case were open, and booted up the computer.  Cool.  It spins!  I watched it spin up, and the computer boot.  Everything working great.  That little moving arm is really neat, too.  It bounces back and forth really quickly!  So, I ran some programs on the computer and started listening for the noises that I thought were signifying an imminent disaster.  The hard drive just sounded rough.  Something like ball bearings worn down or the spindle just getting sticky.  Logically, I got out my WD-40, with the little red straw to make sure I could target the center of the spindle.

Squirt.... Squirt... drip, drip.

Well, let's see if this works for a while.  Maybe that was enough to quiet the drive down.

Things working fine.  Then the head did a seek and ran right through a drip of WD-40 and smeared across the platter.  Is that bad?  Then the actuator arm started thrashing back and forth, clicking hard against the center of the spindle and back against the outer wall of the case.  Clunk.  Clunk.  CLUNK.  Whirrrrr..rr...r....  Quiet.  Computer locked up.  Hard drive stopped.

Uh oh...

Maybe if I clean that WD-40 off of the platter it will work again?

So, I got out my trusty Goo Gone and a soft rag to remove the extra drips of WD-40 that were now smeared across the top platter of the hard drive.  Rub, rub.  Wipe.  Rub, rub.  Polish.  That looks pretty good.  Let's spin it back up and see.  W....h...i.rrrrrrrrrrr.  OK, that sounds pretty.... CLUNK.  Clunk.  Clunk.  Clunk.  Unplug the computer.

I worked on this for a couple of hours.  I used more Goo Gone.  I used alcohol -- both the rubbing kind to clean the platter and the drinking kind to calm my frustration.  In the end, I was able to get the drive spun up long enough to retrieve some files.  This was still an age when most of my working documents were on floppy disks, because I needed to carry those between different computers.  So, luckily, there was no important data lost.

I've been thinking a lot lately about how "broken" business processes impact data quality and data integrity -- thinking about the ways we look at trying to keep the data inside those disks clean and running smoothly.  Sometimes we look at things from a perspective that is too distant, with a too limited understanding of the context of the processes that we're examining, and act too quickly and too inexpertly without taking time to understand the nuances of the systems and business processes involved.  We do things that we think will help (implement governance processes and quality screens) and end up sending the system into a tailspin.  Things do recover from that dive, but not without a major investment in time and energy.

Wednesday, February 3, 2010

New Magic Quadrant for DW DBMS

Gartner released it's new Magic Quadrant for Data Warehouse DBMS platforms.  I tend to think that Gartner does a reasonably good job with its Magic Quadrant results.

Here's my personal summary.
  • Oracle holding on to 2nd place for enterprise data warehousing with its "reference configurations," Exadata product, and 11g upgrade as prominent strengths; downside being claims of higher DBA FTE cost than some other DW platforms.
  • Teradata remaining the clear leader and probably expanding their market presence with newer pricing model and platform options.
  • Microsoft lagging far behind the other big vendors while it tries to integrate the technology it purchased with DATAllegro in 2007.
  • My favorite aspiring open source vendor, InfoBright, sitting in the middle of the of the niche quadrant as a brand new entrant.

Full disclosure: I don't own stock or have a personal stake in Gartner or any of these companies.  I've personally developed or managed large data warehouses on DB2, Teradata, and Oracle.