This first post is about the struggle to identify keys for business entities. We set forth some fundamental principles when we started out on our latest large scale project. First and foremost, we would "throw nothing away". What that's meant is that we want to design the foundation of our reporting database to be a reflection not just of one department's truth, but all of the truths that might exist across the enterprise.
As a result, the design of every major entity has run into the same challenge: "what is the business key for this entity?" Well, from System A, it's the abc code. From System D it's the efg identifier. But if someone put in the xyz ID that the government assigns, then you can use System D to get to this industry file that we get updated every 3 months and link that back to an MS Access database that Ms. So-and-so maintains. Ack! Clearly we can't just say an apple is an apple. And clearly there's a data governance issue at play in this scenario also.
In one case, some of these are legacy systems that simply are what they are and aren't worth investing additional time and energy into.
Our data modeling challenge is to determine what the one business key would be for all the different instances of this one logical entity. When confronted with the challenge of having no clear business key, the project wanted to "just add a source system code and be done with it." I pushed hard against this approach for a couple of weeks, insisting that the team keep going back and working harder to dig up what could be a true business key. Eventually, I realized that I was both working contrary to one of the original goals I'd set forth and becoming the primary roadblock to progress.
Interesting side note: One of the better tricks of good software design is to defer decisions to the last possible minute. If you get away without writing some piece of code, then best to put it off until you have to write it. There's obviously some nuance and art to understanding how to leverage that. The Strategy Pattern is a good example, though.
What I realized I was doing was trying to put a huge and potentially very dynamic piece of logic out in front of our need to simply capture the information that was being created by source systems. So, we instituted what felt like a completely counter-intuitive design standard: every business key would include source system code as part of the compound key; and we would defer the need to consolidate and deduplicate instances of an entity until after all the underlying data had first been captured.
Deduplicating is the immediate next process, but this allows to be sure that we've captured all the raw information from the source system first before throwing away the fact that there are different versions of the truth in different source systems.
A very powerful lesson for us that felt very counter-intuitive; that we started considering for the wrong reasons; and finally decided to follow through on for all the right reasons!