Monday, August 23, 2010

Business Keys

I'm engaged in a project that's actively using key concepts from Dan Linstedt's Data Vault methodology.  There are lots of very powerful benefits that we seem to be realizing with this methodology, but this series of blog posts won't be particularly about use of the Data Vault.  Still, I felt it was appropriate to credit the Data Vault for helping provide a structure for our own discovery.

This first post is about the struggle to identify keys for business entities.  We set forth some fundamental principles when we started out on our latest large scale project.  First and foremost, we would "throw nothing away".  What that's meant is that we want to design the foundation of our reporting database to be a reflection not just of one department's truth, but all of the truths that might exist across the enterprise.

As a result, the design of every major entity has run into the same challenge: "what is the business key for this entity?"  Well, from System A, it's the abc code.  From System D it's the efg identifier.  But if someone put in the xyz ID that the government assigns, then you can use System D to get to this industry file that we get updated every 3 months and link that back to an MS Access database that Ms. So-and-so maintains.  Ack!  Clearly we can't just say an apple is an apple.  And clearly there's a data governance issue at play in this scenario also.

In one case, some of these are legacy systems that simply are what they are and aren't worth investing additional time and energy into.

Our data modeling challenge is to determine what the one business key would be for all the different instances of this one logical entity.  When confronted with the challenge of having no clear business key, the project wanted to "just add a source system code and be done with it."  I pushed hard against this approach for a couple of weeks, insisting that the team keep going back and working harder to dig up what could be a true business key.  Eventually, I realized that I was both working contrary to one of the original goals I'd set forth and becoming the primary roadblock to progress.

Interesting side note:  One of the better tricks of good software design is to defer decisions to the last possible minute.  If you get away without writing some piece of code, then best to put it off until you have to write it.  There's obviously some nuance and art to understanding how to leverage that.  The Strategy Pattern is a good example, though.

What I realized I was doing was trying to put a huge and potentially very dynamic piece of logic out in front of our need to simply capture the information that was being created by source systems.  So, we instituted what felt like a completely counter-intuitive design standard:  every business key would include source system code as part of the compound key; and we would defer the need to consolidate and deduplicate instances of an entity until after all the underlying data had first been captured.

Deduplicating is the immediate next process, but this allows to be sure that we've captured all the raw information from the source system first before throwing away the fact that there are different versions of the truth in different source systems.

A very powerful lesson for us that felt very counter-intuitive; that we started considering for the wrong reasons; and finally decided to follow through on for all the right reasons!


  1. We had the same issue with one entity and business keys. We got entitys (people) from different source systems. We had to find a way to merge the systems into one entity with a conformed business key. We created a "staging table" and mapped/merged the different records to a conformed entity with a business key (compound). After this step, we loaded the records into the Datawarehouse. I think this is a good way to archive this goal. You need a conformed business key over the different source system. If there isn't one, create one.
    What says Dan about this?

  2. I think Dan would say you're implementing a Business Data Vault;)

  3. Hi Paul,

    Love to hear more about the struggles, successes, and break-throughs. Perhaps I can offer some light at the end of the tunnel (or at least some expediency) if you run in to an obstacle.

    From the sounding of the post, the keys are definately a struggle for most organizations, and the last thing I'd want anyone to do is run headlong into analysis-paralysis (which is where you were originally headed). On the other hand, often times people forget to "remember" that the Data Vault model is meant to be a flexible, living breathing model.

    And with that, getting something that works today for raw data, and beginning to capture the information is most important. REMEMBERING that the model is FLUID, and will/should change as new metadata definitions are unveiled. Perhaps, never defined - therefore making that portion of the model static in architecture, but flexible in where it LINKS to other components.

    Regarding the business data vault - for me, I need to write more about the definition of what I think a BDV is. Anyhow, the for this to be a BDV I would have to argue that the business rules are changing the content and meaning of the data before it is stored in the Data Vault. That alone would make it a BDV.

    If however, it is simply de-duped, but still loaded raw, I would still call it an auditable raw Data Vault.

    Paul - drop me a line (email first, as I'm on the road next week), we should catch up anyhow.

    Regarding cross-mapping business keys: that's fine, as long as the content and context of the data are not altered. the definition of "conformed" needs to be solidified here. Alas, business keys are slippery by their very nature - unless you are lucky enough to have a SINGLE source system from which your entire business runs. Even then, the minute the key changes, it can present challenges.

    Everyone should look forward to the publication of my Technical DV Modeling book in December this year - it explains some of these foundational concepts.

    Dan Linstedt
    PS: Thank-you everyone for writing such interesting things, I wouldn't be where I am without your support.