Wednesday, December 30, 2009

Who's data is it? (Part 1)

The problem of data ownership:
I want to thank Jim Harris for his great post about The First Law of Data Quality from earlier this month. It's certainly an excellent read.  One of the things it reminded me of is the experiences that I've had working with various other business and IT teams when trying to get access to and understand data from particular systems.  The issue of "data ownership" always seems to play an antagonistic role in any desire to acquire and integrate information from multiple systems.  Generalizing, I think there are three different negative responses you can receive when looking to a new source system for access to their data for integration into a data warehouse:
  • That's the vendor's data.  We wouldn't be able to understand it.
  • That's my data.  I'll tell you what you need.
  • That's not my data.  I don't know anything about it.
This series of blog posts will explore each of those in more detail:

It's the vendor's data...
The "it's the vendor's data" culture has a black-box view of applications: there is no separate understanding of application and information.  So, there's a belief that the only way to interact with the data in the application is have a vendor or partner create new application features or vendor supported interfaces.

Individuals who have experience in supporting the end users of applications and sometimes the administrative configuration of applications often put significant weight in the features and functionality of the application itself.  From an integration perspective, this significantly limits the kind of integration that can be done with many applications.  For instance, most applications provide users with various types of "report writing" functionality.  In many applications, this is merely an extract-generation feature that allows users to select the fields they're interested in and exporting that data to a CSV or Excel file.  Application support analysts who are used to supporting users in this way and have, maybe, not recently been involved in technical integration work may default to a view that the only way to get information out that particular application is through these manual extract tools.  Of course, having individuals manually run extracts from a system on a daily basis is not an ideal pattern for sourcing data into a data movement process.

In situations of heavy application reliance, application teams might never think of doing integration directly from the database; or the team might feel that any such endeavor would require many hours of training from the vendor.  Of course, the vendor would appreciate the professional services or training for that.  In most applications, however, that is typically unnecessary.  Reverse engineering a data model is an easy exercise and, depending on the complexity of system.  Reverse engineering the underlying data typically takes more analysis, but having access to both the application and database (in several environments) provides a powerful way to understand both the application and the data.

Enterprise scale applications typically provide some time of standard interfaces, whether those are X12, EDI, HL7, or a proprietary way of moving data in and out of the system.  Many modern applications provide a service oriented API to allow external applications to do read or write operations to the application.  Standard interfaces and services are an excellent way to retrieve data for a data warehouse.  In lieu of sufficient interfaces, though, database to data warehouse integration is still a common, powerful, and appropriate integration model.

Convincing application teams that connecting directly to an application database without engaging and paying for professional services from the vendor can be challenging.  One effective way to break through part of that cultural difference is to do a proof of concept and show how the technology and analysis can work to create custom integration solutions without necessarily needing vendor intervention. A quick win to prove that the mysterious data behind some application really is just letters and numbers goes a long way to gaining confidence more substantial integration work.

1 comment:

  1. Nice post.
    I do believe that Data Access APIs should receive some coverage re. Data Movement aspect of this post though :-)

    This post is exhibit 1 re. the value proposition of the Open Database Connectivity (ODBC) API. Its also an exemplar for why Native DBMS APIs simply will never cut it based on failure to separate Data Access API and DBMS engine.