Thursday, November 26, 2009

Correlate: The 2nd "C"

Collecting information from a variety of sources is powerful in and of itself, but having an integrated collection of data makes analysis across business processes, subject areas, and source systems many times more efficient.  The data warehouse must provide an access layer that allows users to easily and naturally merge together pieces of business data that are naturally related in the business model.  There are some aspects of correlation that are business driven and the data warehouse will be a victim or beneficiary of.  For instance, the consistent assignment of employee numbers across applications or an enforced uniqueness of cost centers across multiple financial systems or the corporate data governance decision that only one systems can be considered the authoritative source for any particular data element. 

Other correlation activities are ones that can be supported directly within the data warehouse environment:
  • Code set translations ensure that a user can reference any data source using the approved corporate standard code set for some particular attribute.
  • Cross-reference translates that allow two different identifiers for the same particular entity to be combined together.
  • Transformation of one representation of a particular identifier into a matching representation somewhere else.  For instance, if one system uses a 15 digit cost center that includes accounting unit, account, and sub account, the data warehouse should take that 15 digit string, break it into components, and match those against the information from the finance system.

Having a data warehouse to preexecute these kinds of correlation activities once and store the results for convenient use in tens of thousands of queries every month or even every day is a huge benefit to analysts through clear communication and improved performance.

#PragProWriMo word count: 46,141

No comments:

Post a Comment